Next: A Day With CVS, Up: An Overview of CVS
If you've never used CVS (or any version control system) before, it's easy to get tripped up by some of its underlying assumptions. What seems to cause the most initial confusion about CVS is that it is used for two apparently unrelated purposes: record keeping and collaboration. It turns out, however, that these two functions are closely connected.
Record keeping became necessary because people wanted to compare a program's current state with how it was at some point in the past. For example, in the normal course of implementing a new feature, a developer may bring the program into a thoroughly broken state, where it will probably remain until the feature is mostly finished. Unfortunately, this is just the time when someone usually calls to report a bug in the last publicly released version. To debug the problem (which may also exist in the current version of the sources), the program has to be brought back to a useable state.
Restoring the state poses no difficulty if the source code history is kept under CVS. The developer can simply say, in effect, "Give me the program as it was three weeks ago", or perhaps "Give me the program as it was at the time of our last public release". If you've never had this kind of convenient access to historical snapshots before, you may be surprised at how quickly you come to depend on it. Personally, I always use revision control on my coding projects now – it's saved me many times.
To understand what this has to do with facilitating collaboration, we'll need to take a closer look at the mechanism that CVS provides to help numerous people work on the same project. But before we do that, let's take a look at a mechanism that CVS doesn't provide (or at least, doesn't encourage): file locking. If you've used other version control systems, you may be familiar with the lock-modify-unlock development model, wherein a developer first obtains exclusive write access (a lock) to the file to be edited, makes the changes, and then releases the lock to allow other developers access to the file. If someone else already has a lock on the file, they have to "release" it before you can lock it and start making changes (or, in some implementations, you may "steal" their lock, but that is often an unpleasant surprise for them and not good practice!).
This system is workable if the developers know each other, know who's planning to do what at any given time, and can communicate with each other quickly if someone cannot work because of access contention. However, if the developer group becomes too large or too spread out, dealing with all the locking issues begins to chip away at coding time; it becomes a constant hassle that can discourage people from getting real work done.
CVS takes a more mellow approach. Rather than requiring that developers coordinate with each other to avoid conflicts, CVS enables developers to edit simultaneously, assumes the burden of integrating all the changes, and keeps track of any conflicts. This process uses the copy-modify-merge model, which works as follows:
As far as CVS is concerned, all developers on a project are equal. Deciding when to update or when to commit is largely a matter of personal preference or project policy. One common strategy for coding projects is to always update before commencing work on a major change and to commit only when the changes are complete and tested so that the master copy is always in a "runnable" state.
Perhaps you're wondering what happens when developers A and B, each in their own working copy, make different changes to the same area of text and then both commit their changes? This is called a conflict, and CVS notices it as soon as developer B tries to commit changes. Instead of allowing developer B to proceed, CVS announces that it has discovered a conflict and places conflict markers (easily recognizable textual flags) at the conflicting location in his copy. That location also shows both sets of changes, arranged for easy comparison. Developer B must sort it all out and commit a new revision with the conflict resolved. Perhaps the two developers will need to talk to each other to settle the issue. CVS only alerts the developers that there is a conflict; it's up to human beings to actually resolve it.
What about the master copy? In official CVS terminology, it is called the project's repository. The repository is simply a file tree kept on a central server. Without going into too much detail about its structure (but see Repository Administration), let's look at what the repository must do to meet the requirements of the checkout-commit-update cycle. Consider the following scenario:
At this point, one of two things can happen. If none of the files edited by developer B have been edited by A, the commit succeeds. However, if CVS realizes that some of B's files are out of date with respect to the repository's latest copies, and those files have also been changed by B in his working copy, CVS informs B that he must do an update before committing those files.
When developer B runs the update, CVS merges all of A's changes into B's local copies of the files. Some of A's work may conflict with B's uncommitted changes, and some may not. Those parts that don't are simply applied to B's copies without further complication, but the conflicting changes must be resolved by B before being committed.
If developer C does an update now, she'll receive various new changes from the repository: those from A's third commit, and those from B's first successful commit (which might really come from B's second attempt to commit, assuming B's first attempt resulted in B being forced to resolve conflicts).
In order for CVS to serve up changes, in the correct sequence, to developers whose working copies may be out of sync by varying degrees, the repository needs to store all commits since the project's beginning. In practice, the CVS repository stores them all as successive diffs. Thus, even for a very old working copy, CVS is able to calculate the difference between the working copy's files and the current state of the repository, and is thereby able to bring the working copy up to date efficiently. This makes it easy for developers to view the project's history at any point and to revive even very old working copies.
Although, strictly speaking, the repository could achieve the same results by other means, in practice, storing diffs is a simple, intuitive means of implementing the necessary functionality. The process has the added benefit that, by using patch appropriately, CVS can reconstruct any previous state of the file tree and thus bring any working copy from one state to another. It can allow someone to check out the project as it looked at any particular time. It can also show the differences, in diff format, between two states of the tree without affecting someone's working copy.
Thus, the very features necessary to give convenient access to a project's history are also useful for providing a decentralized, uncoordinated developer team with the ability to collaborate on the project.
For now, you can ignore the details of setting up a repository, administering user access, and navigating CVS-specific file formats (those will be covered in Repository Administration). For the moment, we'll concentrate on how to make changes in a working copy.
But first, here is a quick review of terms: