Next: , Up: An Overview of CVS

Basic Concepts

If you've never used CVS (or any version control system) before, it's easy to get tripped up by some of its underlying assumptions. What seems to cause the most initial confusion about CVS is that it is used for two apparently unrelated purposes: record keeping and collaboration. It turns out, however, that these two functions are closely connected.

Record keeping became necessary because people wanted to compare a program's current state with how it was at some point in the past. For example, in the normal course of implementing a new feature, a developer may bring the program into a thoroughly broken state, where it will probably remain until the feature is mostly finished. Unfortunately, this is just the time when someone usually calls to report a bug in the last publicly released version. To debug the problem (which may also exist in the current version of the sources), the program has to be brought back to a useable state.

Restoring the state poses no difficulty if the source code history is kept under CVS. The developer can simply say, in effect, "Give me the program as it was three weeks ago", or perhaps "Give me the program as it was at the time of our last public release". If you've never had this kind of convenient access to historical snapshots before, you may be surprised at how quickly you come to depend on it. Personally, I always use revision control on my coding projects now – it's saved me many times.

To understand what this has to do with facilitating collaboration, we'll need to take a closer look at the mechanism that CVS provides to help numerous people work on the same project. But before we do that, let's take a look at a mechanism that CVS doesn't provide (or at least, doesn't encourage): file locking. If you've used other version control systems, you may be familiar with the lock-modify-unlock development model, wherein a developer first obtains exclusive write access (a lock) to the file to be edited, makes the changes, and then releases the lock to allow other developers access to the file. If someone else already has a lock on the file, they have to "release" it before you can lock it and start making changes (or, in some implementations, you may "steal" their lock, but that is often an unpleasant surprise for them and not good practice!).

This system is workable if the developers know each other, know who's planning to do what at any given time, and can communicate with each other quickly if someone cannot work because of access contention. However, if the developer group becomes too large or too spread out, dealing with all the locking issues begins to chip away at coding time; it becomes a constant hassle that can discourage people from getting real work done.

CVS takes a more mellow approach. Rather than requiring that developers coordinate with each other to avoid conflicts, CVS enables developers to edit simultaneously, assumes the burden of integrating all the changes, and keeps track of any conflicts. This process uses the copy-modify-merge model, which works as follows:

  1. Developer A requests a working copy (a directory tree containing the files that make up the project) from CVS. This is also known as "checking out" a working copy, like checking a book out of the library.
  2. Developer A edits freely in her working copy. At the same time, other developers may be busy in their own working copies. Because these are all separate copies, there is no interference – it is as though all of the developers have their own copy of the same library book, and they're all at work scribbling comments in the margins or rewriting certain pages independently.
  3. Developer A finishes her changes and commits them into CVS along with a "log message", which is a comment explaining the nature and purpose of the changes. This is like informing the library of what changes she made to the book and why. The library then incorporates these changes into a "master" copy, where they are recorded for all time.
  4. Meanwhile, other developers can have CVS query the library to see if the master copy has changed recently. If it has, CVS automatically updates their working copies. (This part is magical and wonderful, and I hope you appreciate it. Imagine how different the world would be if real books worked this way!)

As far as CVS is concerned, all developers on a project are equal. Deciding when to update or when to commit is largely a matter of personal preference or project policy. One common strategy for coding projects is to always update before commencing work on a major change and to commit only when the changes are complete and tested so that the master copy is always in a "runnable" state.

Perhaps you're wondering what happens when developers A and B, each in their own working copy, make different changes to the same area of text and then both commit their changes? This is called a conflict, and CVS notices it as soon as developer B tries to commit changes. Instead of allowing developer B to proceed, CVS announces that it has discovered a conflict and places conflict markers (easily recognizable textual flags) at the conflicting location in his copy. That location also shows both sets of changes, arranged for easy comparison. Developer B must sort it all out and commit a new revision with the conflict resolved. Perhaps the two developers will need to talk to each other to settle the issue. CVS only alerts the developers that there is a conflict; it's up to human beings to actually resolve it.

What about the master copy? In official CVS terminology, it is called the project's repository. The repository is simply a file tree kept on a central server. Without going into too much detail about its structure (but see Repository Administration), let's look at what the repository must do to meet the requirements of the checkout-commit-update cycle. Consider the following scenario:

  1. Two developers, A and B, check out working copies of a project at the same time. The project is at its starting point – no changes have been committed by anyone yet, so all the files are in their original, pristine state.
  2. Developer A gets right to work and soon commits her first batch of changes.
  3. Meanwhile, developer B watches television.
  4. Developer A, hacking away like there's no tomorrow, commits her second batch of changes. Now, the repository's history contains the original files, followed by A's first batch of changes, followed by this set of changes.
  5. Meanwhile, developer B plays video games.
  6. Suddenly, developer C joins the project and checks out a working copy from the repository. Developer C's copy reflects A's first two sets of changes, because they were already in the repository when C checked out her copy.
  7. Developer A, continuing to code as one possessed by spirits, completes and commits her third batch of changes.
  8. Finally, blissfully unaware of the recent frenzy of activity, developer B decides it's time to start work. He doesn't bother to update his copy; he just commences editing files, some of which may be files that A has worked in. Shortly thereafter, developer B commits his first changes.

At this point, one of two things can happen. If none of the files edited by developer B have been edited by A, the commit succeeds. However, if CVS realizes that some of B's files are out of date with respect to the repository's latest copies, and those files have also been changed by B in his working copy, CVS informs B that he must do an update before committing those files.

When developer B runs the update, CVS merges all of A's changes into B's local copies of the files. Some of A's work may conflict with B's uncommitted changes, and some may not. Those parts that don't are simply applied to B's copies without further complication, but the conflicting changes must be resolved by B before being committed.

If developer C does an update now, she'll receive various new changes from the repository: those from A's third commit, and those from B's first successful commit (which might really come from B's second attempt to commit, assuming B's first attempt resulted in B being forced to resolve conflicts).

In order for CVS to serve up changes, in the correct sequence, to developers whose working copies may be out of sync by varying degrees, the repository needs to store all commits since the project's beginning. In practice, the CVS repository stores them all as successive diffs. Thus, even for a very old working copy, CVS is able to calculate the difference between the working copy's files and the current state of the repository, and is thereby able to bring the working copy up to date efficiently. This makes it easy for developers to view the project's history at any point and to revive even very old working copies.

Although, strictly speaking, the repository could achieve the same results by other means, in practice, storing diffs is a simple, intuitive means of implementing the necessary functionality. The process has the added benefit that, by using patch appropriately, CVS can reconstruct any previous state of the file tree and thus bring any working copy from one state to another. It can allow someone to check out the project as it looked at any particular time. It can also show the differences, in diff format, between two states of the tree without affecting someone's working copy.

Thus, the very features necessary to give convenient access to a project's history are also useful for providing a decentralized, uncoordinated developer team with the ability to collaborate on the project.

For now, you can ignore the details of setting up a repository, administering user access, and navigating CVS-specific file formats (those will be covered in Repository Administration). For the moment, we'll concentrate on how to make changes in a working copy.

But first, here is a quick review of terms:

Karl Fogel wrote this book. Buy a printed copy via his homepage at

copyright  ©  November 12 2019 sean dreilinger url: