The Future of Distributed
Software Development on the Internet
From CVS to WebDAV to
Delta-V
By Jim Whitehead
Every day, developers with a shared software vision band
together from around the world to develop Open Source software.
A similar trend occurs in the corporate world: Large companies
with physically dispersed divisions create distributed teams to
work together on software projects. Cross-organizational
projects also occur with greater frequency, such as a
subcontractor working closely with a primary systems-integration
contractor on a large project.
These geographically dispersed teams share the same needs for
distributed source-code control. When it comes to working on the
design documents, test cases, specifications, and source code
that comprise the project, individual team members need to work
on pieces in isolation, then integrate those pieces with the
modifications of their coworkers, without clobbering anyone
else's changes. Changes need to be tracked so that errors and
exploratory design changes can be undone easily. Tracking
creates a group memory of how files have changed over time --
valuable for later reconstruction of detailed design rationales.
Released and stable configurations of the project are tracked so
they can be regenerated quickly, and so that bug fixes can be
made to the appropriate release. These capabilities are all
provided by software configuration management (SCM) systems.
SCM systems use a library metaphor to control access to
project documents and source code. At first, the SCM repository
holds all development files in a "checked-in" state. To work on
a file, one needs to check it out, just like taking a book out
of a library. Once changes are complete, the file is checked
back in, accompanied with brief comments describing the changes.
A checked-in file is immutable, and can't be changed again
without checking it out.
Once a change-tracking system is in place, it's possible to
view previous revisions of a file and see differences between
revisions. Another typical feature is viewing the change history
of a file -- listing the modification date for all revisions,
the person who made the change, and the comments he or she
submitted with the change. It's also possible to discard some
revisions -- a useful capability if an exploratory change
doesn't work out as intended.
Revision tracking also makes configuration tracking possible.
Since any nontrivial software system is composed of multiple
source objects, which are described by multiple design and
requirements documents, freezing the state of an entire project
requires knowing the exact version of each file in the project
so that a consistent snapshot can be made. SCM systems provide
this capability, allowing users to create baselines that can be
used for testing and release tracking. Since all checked-in
revisions are immutable, it's possible to revert to a previous
project configuration, a critical capability for supporting
previously released software projects.
Remote Configuration
Management with CVS
Today, the distributed configuration management system of
choice for Open Source developers is the Concurrent Versions
System (CVS). Currently in use by the Apache HTTP Server and
Netscape Communicator (Mozilla) Web-browser Open Source
projects, CVS has many advantages for distributed teamwork.
Since CVS is itself an Open Source project, it's freely and
widely available. In addition to providing typical versioning
and configuration-management features, CVS also offers excellent
work isolation for team members, and the CVS client/server
protocol allows this teamwork to occur remotely. The cvsweb
utility allows CVS version histories, old revisions, and
differences between revisions to be browsed in a read-only
manner on the Web. CVS front ends have been developed for UNIX,
PC, and Mac systems, allowing developers from all platforms to
participate on a project (see the box titled "Giving
CVS a Facelift"). Since many Open Source projects use CVS,
there is a large and growing pool of developers who know CVS,
and understand how to use it for team work. In conjunction with
an email mailing list, a Web site giving project overview and
documentation, and a bug reporting and tracking system, CVS is a
key coordination infrastructure for performing collaborative
teamwork via the Internet.
Jim Jagielski's article in this issue on the Apache
development process highlights how CVS is used on a successful
Open Source development project. Using the CVS
update-edit-commit work cycle, Apache developers are able to
work on source code on their local machines, thereby isolating
themselves from the changes made by other developers. When local
changes are complete, they are merged with the intervening
modifications of other developers, and then committed to central
development server.
CVS isn't the only configuration-management tool that
supports remote development teams. Commercial SCM systems
frequently provide this capability, examples being Rational
ClearCase MultiSite, Merant PVCS Replicator, and the
Continuus/DCM distributed change management product. Other Open
Source tools also offer distribution support, a notable one
being the Distributed Versioning System (DVS) available at the
University of Colorado. These systems are just the tip of the
iceberg. The Configuration Management Yellow Pages has an
exhaustive listing of existing commercial and Open Source
systems (see "Online").
Today: Remote Web
Authoring with WebDAV
Exciting new work that's just starting in the Internet
Engineering Task Force (IETF) promises to make it easier to
perform remote collaborative project work over the Web. The new
effort is called Delta-V, and its goal is to provide versioning
and configuration management capabilities for the Web by
extending the Web's core protocol, HTTP. Using Delta-V,
collaborative teams will be able to edit the source code,
documents, Web pages, and binary graphics in a project, then
record important revisions and manage project configurations --
all in-place on the Web. The Delta-V activity is building upon
the work of the WebDAV Distributed Authoring Protocol, an IETF
standard that has extended HTTP with operations for remote
collaborative authoring on the Web. Delta-V extends HTTP and
WebDAV with versioning, isolation of individual changes from
collaborators' changes, and SCM capabilities.
The WebDAV protocol, the foundation on which Delta-V is
built, extends the Web to make authoring of Web resources as
easy as browsing them. Unlike CVS, which downloads files to a
local hard drive to retain compatibility with existing
applications, with WebDAV Web resources are edited directly on a
Web server. Applications must be modified to interact with the
Web server using the WebDAV protocol. Though WebDAV is still in
the early stages of adoption, Internet Explorer 5 and the Office
2000 suite of applications have already integrated WebDAV
support via a feature called "Web Folders," providing remote
authoring for Word, Excel, and PowerPoint documents directly on
the Web (see the box titled "Web
Folders and WebDAV"). Additionally, WebDAV Explorer provides
a file-system explorer interface for a WebDAV server. There are
many existing WebDAV servers, including the mod_dav module for
the Apache server, Microsoft Internet Information Server (IIS)
5, Glyphica PortalWare, Xythos Storage Server, DataChannel RIO,
Intraspect Knowledge Server, Digital Creations Zope, CyberTeams
WebSite Director lite, and the freely available WebRFM. The IBM
DAV4J server, available from AlphaWorks, also provides a Java
client API for WebDAV.
WebDAV features are designed to accommodate existing tools,
making it straightforward to integrate WebDAV-based remote
authoring into them. WebDAV's namespace operations provide the
ability to create and list collections, and to copy and move Web
resources, thus supporting the needs of "File... Open" and
"File... Save" user-interface dialog boxes. Locking of entire
Web resources provides overwrite protection for all types of Web
resources (HTML pages, GIF images, word processing documents,
and source-code text files), and in fact, one of WebDAV's design
principles is to provide equal support for all Web-resource
types. WebDAV also provides support for storing and retrieving
metadata, in the form of attribute-value pairs called
properties, associated with a resource. The name of a WebDAV
property is a URL, used in this case as a property identifier,
not as a locator, and a property value is well-formed XML,
gaining XML's advantages for representing structured data and
for internationalizing string values.
Early Web-authoring tools encountered the "lost update
problem," which occurs when two or more simultaneous authors of
a Web page clobber each other's work with successive saves to
the same URL without first merging their changes. Although HTTP
1.1 has support for detecting lost updates through unique
identifiers associated with the document state, no support is
provided for preventing lost updates in the first place. To
solve this problem, WebDAV uses long-duration, whole resource
locking as its concurrency control mechanism. The WebDAV
protocol provides a write lock, but no read lock capability. On
the Web, by default a resource is readable, although it may be
protected by access control. Therefore, HTTP doesn't require
that a Web browser obtain a lock in order to read a resource, as
is the case with traditional database locking, retrofitting the
Web with this capability was neither feasible nor desirable. Web
servers implement the write operation PUT by saving the contents
of the resource in a temporary buffer until the entire new
resource has been transmitted, then using internal concurrency
control to block read access while the new value is quickly
updated. So the traditional database problem of reading a value
in an inconsistent state is avoided. Another traditional
database issue, deadlock, is also avoided with WebDAV locks.
Since locks are granted via a protocol request, with a given
request either granted or denied, there's no blocking, and hence
no possibility of deadlock.
WebDAV servers have used differing strategies to implement
the features in the protocol -- the major difference is the
underlying repository chosen by the server to store properties
and resources. Microsoft's IIS 5 server uses the Windows 2000
file system as its repository, and provides an extremely tight
integration between file system services and WebDAV services.
When a file is locked via WebDAV, it is also locked in the file
system, and hence a local user cannot clobber a file locked by a
remote user. IIS 5 also uses Windows 2000 user and
access-control lists to determine whether a WebDAV user has
access to a particular file; there is no separate Web
access-control mechanism used by IIS 5. In contrast, the mod_dav
Apache module also uses a file system repository, but requires
that the Apache server owns all WebDAV authorable files, thus
effectively preventing local access to the files. This avoids
the need to assume root privileges under UNIX to change the
ownership of files -- a security risk -- and lets mod_dav create
users that don't have local system accounts, only WebDAV
authoring privileges. Restricting local file access prevents
another potential problem: Since mod_dav stores properties in a
separate database, moving or deleting a file without telling
mod_dav results in "ghost" property entries for a resource that
no longer exists.
Other WebDAV servers store their information in databases
instead of the file system. The Glyphica PortalWare server has
created a content management system that sits on top of the
Versant object-oriented database system. All documents that are
submitted to PortalWare are indexed for full-text searching, and
have properties associated with them in the database. The Xythos
Storage Server uses a relational database for storage, instead
of an object-oriented one. The Xythos server uses standard SQL
via JDBC to interface with its database, which, combined with
the cross-platform support of databases like Oracle, Sybase, and
Informix, lets the Xythos server run cross-platform, and on a
variety of databases. Both servers gain several typical database
advantages, including transaction support that's useful in
implementing WebDAV methods, and good recovery from disasters
like power outages and disk failures.
The Future: Web-Based
Delta-V
While WebDAV's remote-authoring features are useful for
performing remote collaborative authoring, they highlight the
need for versioning support to preserve the history of work. The
work on Delta-V is intended to fill this role, adding versioning
support to WebDAV. Work on Delta-V is ongoing, so details of the
protocol may change as the standardization work continues, but
there's increasing convergence on its features and benefits. ( Figure
1 provides a high-level architecture diagram showing several
applications using Delta-V.)
Work is progressing rapidly, driven by working group
participants with a deep background in SCM, document management,
software environments, and Web portal systems. These
participants come from the leading companies in these areas:
IBM, Microsoft, Novell, Rational, Merant, DataChannel, Object
Technology International, and Dynamic Diagrams, with university
participation from U.C. Irvine.
The Delta-V protocol addresses several shortcomings in CVS.
The primary advantage of Delta-V is its tight integration with
the Web. Using CVS to manage a Web site requires understanding
how the file structure managed by CVS maps into URLs served by
HTTP, a difficult concept for many users. With Delta-V, Web
resources are edited in-place, at a specific URL, and no mapping
of filenames to URLs is necessary. Furthermore, the Web-native
Delta-V protocol can handle the different types of Web resources
better than a file-oriented system like CVS. By versioning Web
resources, Delta-V allows HTML links to old revisions of Web
pages, creating a sort of time machine for the Web. Linking to a
specific revision often can preserve the semantic meaning of a
link, such as when linking to a Web-log site that changes
frequently, where the linked-to information may be gone in a
week. If the site used Delta-V to version its content, these old
revisions would still be accessible.
The Delta-V protocol has several unique features. Delta-V
assumes that most editing will take place directly on Web
resources, which differs from CVS in that there's no local
replica. Isolation from the changes of other team members is
provided by "workspaces," which provide each collaborator with
his or her own view on the resources being edited. Unlike the
local replicas that provide isolation in CVS, workspaces isolate
collaborators as they work on the remote Web server. Overwrite
conflicts are avoided because a resource can be checked out by
multiple people simultaneously, and each check out creates a
separate working resource. Each collaborator actively working on
a resource has a separate virtual working area, identified by
his or her workspace, and modifications are made first in a
workspace, then merged with the changes of other collaborators.
Another drawback of CVS is its client/server protocol, which
is tightly coupled to CVS's repository. Unlike CVS, HTTP and
WebDAV have a proven track record of mapping to multiple types
of server back-end stores, such as databases, document
management systems, and file systems. Delta-V provides a
cross-platform integration layer, thus bringing the benefits of
remote Web collaboration support to a diverse set of existing
back-end repositories that do not currently provide Web
authoring or versioning support. Judging by the participants in
the working group, the Delta-V protocol will be mapped to SCM
systems, document management systems, and content management
systems, all of which employ a database to provide their
features. This makes the Delta-V protocol a more powerful data
integration technology than the CVS client/server protocol,
which maps only to the CVS repository.
Delta-V provides versioning of collections, a feature not
supported by CVS. When a collection is versioned, collections
and their contents follow the check-out/edit/check-in model.
When a collection is checked in, its membership is frozen, and
can't be changed until the collection is checked out again.
Making a new file or deleting an existing file requires the
parent collection to be checked out. When all collections in a
project are versioned, it's possible to record permanently the
membership of each collection for each moment in time, thus
making configuration management support possible. Once both
collections and their contents are versioned, it's possible to
explicitly pick a single revision of each collection and file
(often the most recent revision), creating a snapshot of the
entire project.
CVS doesn't provide full versioned collection support,
leading to odd glitches. As an example, consider renaming a file
from A to B. Using CVS, this requires three steps: copying file
A's contents into the new location at B; using a cvs
add
to put B into the CVS repository; and a cvs
remove
to delete file A. If the collection containing B
were reverted to a previous state when A was present but B had
not yet been added, the collection will contain both A and B.
Since CVS doesn't store previous revisions of collections, it
doesn't know when B was added, and so can't revert the
collection correctly. Because Delta-V versions collections, it
can avoid this problem. Renaming the file in Delta-V would
involve checking out the collection to make it editable, moving
the file from A to B, and then checking in the collection. If
the collection is reverted to the original version, just before
the initial check out, it will contain A, but not B, and
similarly the following revisions will contain B, but not A.
Versioned collections thus provide the foundation for rigorous
configuration management.
Since Delta-V assumes work will take place directly on a Web
server, rather than on a local replica, existing WebDAV editing
tools, like Office 2000, that are not versioning-aware need to
be accommodated. Delta-V can automatically record, as separate
revisions, changes to a document made by a versioning-unaware
client. Delta-V also divides its functionality into two layers:
a simple versioning layer, and a more complex SCM layer. Since
authoring clients (word processors, text editors, spreadsheets,
and so on) typically work on a single file at a time, they are
only expected to use the basic versioning layer to support a
check out/edit/check in style of work. The typical authoring
client is not expected to provide a user interface for
operations like creating and reverting configurations, since a
configuration spans an entire project, far greater than their
single-file editing scope. A separate SCM control panel
application will make use of the features in the SCM layer. This
control panel will operate at a collection and project level,
providing the capability to create a project configuration or
revert to a previous configuration. It will complement the
single-file focus of the authoring tools with project-wide
capabilities. A full-featured programming environment will be a
third class of Delta-V application, one that uses both the
versioning and configuration capabilities of Delta-V, providing
support for editing individual source-code files, as well as
project-level SCM support.
Despite their differences, Delta-V and CVS have much to offer
each other. Though Delta-V has been designed for collaborators
to work directly on a Web server, it's technically feasible to
use the protocol to create local replicas, as in CVS. In fact,
though it has not been attempted, it appears to possible to
replace the CVS client/server protocol with Delta-V, and an
existing WebDAV client called sitecopy provides a glimpse of how
this could be done. The sitecopy utility allows a local
file-system directory to be replicated to a remote WebDAV
server, so a Web site can be created locally using file-system
based authoring tools, then published remotely using the WebDAV
protocol. In its remote replication support, sitecopy is similar
to the CVS update operation. Though sitecopy and WebDAV don't
support versioning, it's not a far stretch to imagine adding
bidirectional synchronization, conflict flagging, and versioning
operations to sitecopy, thus creating a system that has many of
the capabilities of CVS. But why recreate the CVS user
interface? It's far better to integrate the Delta-V protocol
into CVS, retaining the benefits of the CVS without having to
learn a new system. Since Delta-V can map to multiple back-end
repositories, Delta-V would allow the CVS style of work to be
used against multiple repositories, not just with CVS.
The Delta-V protocol opens up several intriguing
possibilities for building software systems. These possibilities
vary based on where the source code, compiler, and object files
are located -- on the remote Delta-V server or on the local
machine. If they're all on the local machine, then the build
process is very CVS-like, with source code replicated to the
local machine before the compiler begins operation, yielding
object files that reside locally. But if the source code,
compiler, and object files are held remotely, a client would
initiate a build by sending a build request to a remote compile
server, giving the URL of a makefile and a workspace, storing
the object files in the same version-controlled URL hierarchy as
the source code. In this scheme, a different compile server
could compile each platform variant. While the compiler wouldn't
typically be placed on the same machine as the Delta-V server --
so compiles don't adversely affect server performance -- it
would be reasonable to place the compile server on the same
local storage area network as the Delta-V server. Many
interesting configurations are possible for build management
using Delta-V, undoubtedly an area where implementations will
innovate on different strategies.
With a proven track record based on successful use on a wide
range of Open Source projects, CVS is a low-cost, high-value
system available today. Looking to the future, the Delta-V
protocol melds versioning and SCM with the Web, adding powerful
team collaborative work facilities, with the potential for a
value-adding integration with CVS. Whether you're looking at the
state of things today, or the promise of the future, the
implication of these two technologies is clear: It's easier than
ever before to assemble a virtual team for remote collaborative
project work
Jim is the Chair of the IETF WebDAV Working Group, and an
active participant in the Delta-V Working Group. He is also a
Ph.D. student in the Department of Information and Computer
Science at the University of California, Irvine. Professional
experience includes a position at Raytheon, where he designed
firmware in C and Ada for the German civilian air traffic
control system (DERD) and for a prototype Microwave Airplane
Landing System.