magazine resources subscribe about advertising
 Improved!

New Architect Daily
Commentary and updates on current events and technologies

Research
Search for reports and white papers from industry vendors and analysts.

This Month at NewArchitectmag.com Subscribe now to our free email newsletter and get notified when the site is updated with new articles








 New Architect > Archives > 1999 > 10 > Features

The Future of Distributed Software Development on the Internet

From CVS to WebDAV to Delta-V

By Jim Whitehead

Every day, developers with a shared software vision band together from around the world to develop Open Source software. A similar trend occurs in the corporate world: Large companies with physically dispersed divisions create distributed teams to work together on software projects. Cross-organizational projects also occur with greater frequency, such as a subcontractor working closely with a primary systems-integration contractor on a large project.

These geographically dispersed teams share the same needs for distributed source-code control. When it comes to working on the design documents, test cases, specifications, and source code that comprise the project, individual team members need to work on pieces in isolation, then integrate those pieces with the modifications of their coworkers, without clobbering anyone else's changes. Changes need to be tracked so that errors and exploratory design changes can be undone easily. Tracking creates a group memory of how files have changed over time -- valuable for later reconstruction of detailed design rationales. Released and stable configurations of the project are tracked so they can be regenerated quickly, and so that bug fixes can be made to the appropriate release. These capabilities are all provided by software configuration management (SCM) systems.

SCM systems use a library metaphor to control access to project documents and source code. At first, the SCM repository holds all development files in a "checked-in" state. To work on a file, one needs to check it out, just like taking a book out of a library. Once changes are complete, the file is checked back in, accompanied with brief comments describing the changes. A checked-in file is immutable, and can't be changed again without checking it out.

Once a change-tracking system is in place, it's possible to view previous revisions of a file and see differences between revisions. Another typical feature is viewing the change history of a file -- listing the modification date for all revisions, the person who made the change, and the comments he or she submitted with the change. It's also possible to discard some revisions -- a useful capability if an exploratory change doesn't work out as intended.

Revision tracking also makes configuration tracking possible. Since any nontrivial software system is composed of multiple source objects, which are described by multiple design and requirements documents, freezing the state of an entire project requires knowing the exact version of each file in the project so that a consistent snapshot can be made. SCM systems provide this capability, allowing users to create baselines that can be used for testing and release tracking. Since all checked-in revisions are immutable, it's possible to revert to a previous project configuration, a critical capability for supporting previously released software projects.

Remote Configuration Management with CVS

Today, the distributed configuration management system of choice for Open Source developers is the Concurrent Versions System (CVS). Currently in use by the Apache HTTP Server and Netscape Communicator (Mozilla) Web-browser Open Source projects, CVS has many advantages for distributed teamwork. Since CVS is itself an Open Source project, it's freely and widely available. In addition to providing typical versioning and configuration-management features, CVS also offers excellent work isolation for team members, and the CVS client/server protocol allows this teamwork to occur remotely. The cvsweb utility allows CVS version histories, old revisions, and differences between revisions to be browsed in a read-only manner on the Web. CVS front ends have been developed for UNIX, PC, and Mac systems, allowing developers from all platforms to participate on a project (see the box titled "Giving CVS a Facelift"). Since many Open Source projects use CVS, there is a large and growing pool of developers who know CVS, and understand how to use it for team work. In conjunction with an email mailing list, a Web site giving project overview and documentation, and a bug reporting and tracking system, CVS is a key coordination infrastructure for performing collaborative teamwork via the Internet.

Jim Jagielski's article in this issue on the Apache development process highlights how CVS is used on a successful Open Source development project. Using the CVS update-edit-commit work cycle, Apache developers are able to work on source code on their local machines, thereby isolating themselves from the changes made by other developers. When local changes are complete, they are merged with the intervening modifications of other developers, and then committed to central development server.

CVS isn't the only configuration-management tool that supports remote development teams. Commercial SCM systems frequently provide this capability, examples being Rational ClearCase MultiSite, Merant PVCS Replicator, and the Continuus/DCM distributed change management product. Other Open Source tools also offer distribution support, a notable one being the Distributed Versioning System (DVS) available at the University of Colorado. These systems are just the tip of the iceberg. The Configuration Management Yellow Pages has an exhaustive listing of existing commercial and Open Source systems (see "Online").

Today: Remote Web Authoring with WebDAV

Exciting new work that's just starting in the Internet Engineering Task Force (IETF) promises to make it easier to perform remote collaborative project work over the Web. The new effort is called Delta-V, and its goal is to provide versioning and configuration management capabilities for the Web by extending the Web's core protocol, HTTP. Using Delta-V, collaborative teams will be able to edit the source code, documents, Web pages, and binary graphics in a project, then record important revisions and manage project configurations -- all in-place on the Web. The Delta-V activity is building upon the work of the WebDAV Distributed Authoring Protocol, an IETF standard that has extended HTTP with operations for remote collaborative authoring on the Web. Delta-V extends HTTP and WebDAV with versioning, isolation of individual changes from collaborators' changes, and SCM capabilities.

The WebDAV protocol, the foundation on which Delta-V is built, extends the Web to make authoring of Web resources as easy as browsing them. Unlike CVS, which downloads files to a local hard drive to retain compatibility with existing applications, with WebDAV Web resources are edited directly on a Web server. Applications must be modified to interact with the Web server using the WebDAV protocol. Though WebDAV is still in the early stages of adoption, Internet Explorer 5 and the Office 2000 suite of applications have already integrated WebDAV support via a feature called "Web Folders," providing remote authoring for Word, Excel, and PowerPoint documents directly on the Web (see the box titled "Web Folders and WebDAV"). Additionally, WebDAV Explorer provides a file-system explorer interface for a WebDAV server. There are many existing WebDAV servers, including the mod_dav module for the Apache server, Microsoft Internet Information Server (IIS) 5, Glyphica PortalWare, Xythos Storage Server, DataChannel RIO, Intraspect Knowledge Server, Digital Creations Zope, CyberTeams WebSite Director lite, and the freely available WebRFM. The IBM DAV4J server, available from AlphaWorks, also provides a Java client API for WebDAV.

WebDAV features are designed to accommodate existing tools, making it straightforward to integrate WebDAV-based remote authoring into them. WebDAV's namespace operations provide the ability to create and list collections, and to copy and move Web resources, thus supporting the needs of "File... Open" and "File... Save" user-interface dialog boxes. Locking of entire Web resources provides overwrite protection for all types of Web resources (HTML pages, GIF images, word processing documents, and source-code text files), and in fact, one of WebDAV's design principles is to provide equal support for all Web-resource types. WebDAV also provides support for storing and retrieving metadata, in the form of attribute-value pairs called properties, associated with a resource. The name of a WebDAV property is a URL, used in this case as a property identifier, not as a locator, and a property value is well-formed XML, gaining XML's advantages for representing structured data and for internationalizing string values.

Early Web-authoring tools encountered the "lost update problem," which occurs when two or more simultaneous authors of a Web page clobber each other's work with successive saves to the same URL without first merging their changes. Although HTTP 1.1 has support for detecting lost updates through unique identifiers associated with the document state, no support is provided for preventing lost updates in the first place. To solve this problem, WebDAV uses long-duration, whole resource locking as its concurrency control mechanism. The WebDAV protocol provides a write lock, but no read lock capability. On the Web, by default a resource is readable, although it may be protected by access control. Therefore, HTTP doesn't require that a Web browser obtain a lock in order to read a resource, as is the case with traditional database locking, retrofitting the Web with this capability was neither feasible nor desirable. Web servers implement the write operation PUT by saving the contents of the resource in a temporary buffer until the entire new resource has been transmitted, then using internal concurrency control to block read access while the new value is quickly updated. So the traditional database problem of reading a value in an inconsistent state is avoided. Another traditional database issue, deadlock, is also avoided with WebDAV locks. Since locks are granted via a protocol request, with a given request either granted or denied, there's no blocking, and hence no possibility of deadlock.

WebDAV servers have used differing strategies to implement the features in the protocol -- the major difference is the underlying repository chosen by the server to store properties and resources. Microsoft's IIS 5 server uses the Windows 2000 file system as its repository, and provides an extremely tight integration between file system services and WebDAV services. When a file is locked via WebDAV, it is also locked in the file system, and hence a local user cannot clobber a file locked by a remote user. IIS 5 also uses Windows 2000 user and access-control lists to determine whether a WebDAV user has access to a particular file; there is no separate Web access-control mechanism used by IIS 5. In contrast, the mod_dav Apache module also uses a file system repository, but requires that the Apache server owns all WebDAV authorable files, thus effectively preventing local access to the files. This avoids the need to assume root privileges under UNIX to change the ownership of files -- a security risk -- and lets mod_dav create users that don't have local system accounts, only WebDAV authoring privileges. Restricting local file access prevents another potential problem: Since mod_dav stores properties in a separate database, moving or deleting a file without telling mod_dav results in "ghost" property entries for a resource that no longer exists.

Other WebDAV servers store their information in databases instead of the file system. The Glyphica PortalWare server has created a content management system that sits on top of the Versant object-oriented database system. All documents that are submitted to PortalWare are indexed for full-text searching, and have properties associated with them in the database. The Xythos Storage Server uses a relational database for storage, instead of an object-oriented one. The Xythos server uses standard SQL via JDBC to interface with its database, which, combined with the cross-platform support of databases like Oracle, Sybase, and Informix, lets the Xythos server run cross-platform, and on a variety of databases. Both servers gain several typical database advantages, including transaction support that's useful in implementing WebDAV methods, and good recovery from disasters like power outages and disk failures.

The Future: Web-Based Delta-V

While WebDAV's remote-authoring features are useful for performing remote collaborative authoring, they highlight the need for versioning support to preserve the history of work. The work on Delta-V is intended to fill this role, adding versioning support to WebDAV. Work on Delta-V is ongoing, so details of the protocol may change as the standardization work continues, but there's increasing convergence on its features and benefits. ( Figure 1 provides a high-level architecture diagram showing several applications using Delta-V.)

Work is progressing rapidly, driven by working group participants with a deep background in SCM, document management, software environments, and Web portal systems. These participants come from the leading companies in these areas: IBM, Microsoft, Novell, Rational, Merant, DataChannel, Object Technology International, and Dynamic Diagrams, with university participation from U.C. Irvine.

The Delta-V protocol addresses several shortcomings in CVS. The primary advantage of Delta-V is its tight integration with the Web. Using CVS to manage a Web site requires understanding how the file structure managed by CVS maps into URLs served by HTTP, a difficult concept for many users. With Delta-V, Web resources are edited in-place, at a specific URL, and no mapping of filenames to URLs is necessary. Furthermore, the Web-native Delta-V protocol can handle the different types of Web resources better than a file-oriented system like CVS. By versioning Web resources, Delta-V allows HTML links to old revisions of Web pages, creating a sort of time machine for the Web. Linking to a specific revision often can preserve the semantic meaning of a link, such as when linking to a Web-log site that changes frequently, where the linked-to information may be gone in a week. If the site used Delta-V to version its content, these old revisions would still be accessible.

The Delta-V protocol has several unique features. Delta-V assumes that most editing will take place directly on Web resources, which differs from CVS in that there's no local replica. Isolation from the changes of other team members is provided by "workspaces," which provide each collaborator with his or her own view on the resources being edited. Unlike the local replicas that provide isolation in CVS, workspaces isolate collaborators as they work on the remote Web server. Overwrite conflicts are avoided because a resource can be checked out by multiple people simultaneously, and each check out creates a separate working resource. Each collaborator actively working on a resource has a separate virtual working area, identified by his or her workspace, and modifications are made first in a workspace, then merged with the changes of other collaborators.

Another drawback of CVS is its client/server protocol, which is tightly coupled to CVS's repository. Unlike CVS, HTTP and WebDAV have a proven track record of mapping to multiple types of server back-end stores, such as databases, document management systems, and file systems. Delta-V provides a cross-platform integration layer, thus bringing the benefits of remote Web collaboration support to a diverse set of existing back-end repositories that do not currently provide Web authoring or versioning support. Judging by the participants in the working group, the Delta-V protocol will be mapped to SCM systems, document management systems, and content management systems, all of which employ a database to provide their features. This makes the Delta-V protocol a more powerful data integration technology than the CVS client/server protocol, which maps only to the CVS repository.

Delta-V provides versioning of collections, a feature not supported by CVS. When a collection is versioned, collections and their contents follow the check-out/edit/check-in model. When a collection is checked in, its membership is frozen, and can't be changed until the collection is checked out again. Making a new file or deleting an existing file requires the parent collection to be checked out. When all collections in a project are versioned, it's possible to record permanently the membership of each collection for each moment in time, thus making configuration management support possible. Once both collections and their contents are versioned, it's possible to explicitly pick a single revision of each collection and file (often the most recent revision), creating a snapshot of the entire project.

CVS doesn't provide full versioned collection support, leading to odd glitches. As an example, consider renaming a file from A to B. Using CVS, this requires three steps: copying file A's contents into the new location at B; using a cvs add to put B into the CVS repository; and a cvs remove to delete file A. If the collection containing B were reverted to a previous state when A was present but B had not yet been added, the collection will contain both A and B. Since CVS doesn't store previous revisions of collections, it doesn't know when B was added, and so can't revert the collection correctly. Because Delta-V versions collections, it can avoid this problem. Renaming the file in Delta-V would involve checking out the collection to make it editable, moving the file from A to B, and then checking in the collection. If the collection is reverted to the original version, just before the initial check out, it will contain A, but not B, and similarly the following revisions will contain B, but not A. Versioned collections thus provide the foundation for rigorous configuration management.

Since Delta-V assumes work will take place directly on a Web server, rather than on a local replica, existing WebDAV editing tools, like Office 2000, that are not versioning-aware need to be accommodated. Delta-V can automatically record, as separate revisions, changes to a document made by a versioning-unaware client. Delta-V also divides its functionality into two layers: a simple versioning layer, and a more complex SCM layer. Since authoring clients (word processors, text editors, spreadsheets, and so on) typically work on a single file at a time, they are only expected to use the basic versioning layer to support a check out/edit/check in style of work. The typical authoring client is not expected to provide a user interface for operations like creating and reverting configurations, since a configuration spans an entire project, far greater than their single-file editing scope. A separate SCM control panel application will make use of the features in the SCM layer. This control panel will operate at a collection and project level, providing the capability to create a project configuration or revert to a previous configuration. It will complement the single-file focus of the authoring tools with project-wide capabilities. A full-featured programming environment will be a third class of Delta-V application, one that uses both the versioning and configuration capabilities of Delta-V, providing support for editing individual source-code files, as well as project-level SCM support.

Despite their differences, Delta-V and CVS have much to offer each other. Though Delta-V has been designed for collaborators to work directly on a Web server, it's technically feasible to use the protocol to create local replicas, as in CVS. In fact, though it has not been attempted, it appears to possible to replace the CVS client/server protocol with Delta-V, and an existing WebDAV client called sitecopy provides a glimpse of how this could be done. The sitecopy utility allows a local file-system directory to be replicated to a remote WebDAV server, so a Web site can be created locally using file-system based authoring tools, then published remotely using the WebDAV protocol. In its remote replication support, sitecopy is similar to the CVS update operation. Though sitecopy and WebDAV don't support versioning, it's not a far stretch to imagine adding bidirectional synchronization, conflict flagging, and versioning operations to sitecopy, thus creating a system that has many of the capabilities of CVS. But why recreate the CVS user interface? It's far better to integrate the Delta-V protocol into CVS, retaining the benefits of the CVS without having to learn a new system. Since Delta-V can map to multiple back-end repositories, Delta-V would allow the CVS style of work to be used against multiple repositories, not just with CVS.

The Delta-V protocol opens up several intriguing possibilities for building software systems. These possibilities vary based on where the source code, compiler, and object files are located -- on the remote Delta-V server or on the local machine. If they're all on the local machine, then the build process is very CVS-like, with source code replicated to the local machine before the compiler begins operation, yielding object files that reside locally. But if the source code, compiler, and object files are held remotely, a client would initiate a build by sending a build request to a remote compile server, giving the URL of a makefile and a workspace, storing the object files in the same version-controlled URL hierarchy as the source code. In this scheme, a different compile server could compile each platform variant. While the compiler wouldn't typically be placed on the same machine as the Delta-V server -- so compiles don't adversely affect server performance -- it would be reasonable to place the compile server on the same local storage area network as the Delta-V server. Many interesting configurations are possible for build management using Delta-V, undoubtedly an area where implementations will innovate on different strategies.

With a proven track record based on successful use on a wide range of Open Source projects, CVS is a low-cost, high-value system available today. Looking to the future, the Delta-V protocol melds versioning and SCM with the Web, adding powerful team collaborative work facilities, with the potential for a value-adding integration with CVS. Whether you're looking at the state of things today, or the promise of the future, the implication of these two technologies is clear: It's easier than ever before to assemble a virtual team for remote collaborative project work


Jim is the Chair of the IETF WebDAV Working Group, and an active participant in the Delta-V Working Group. He is also a Ph.D. student in the Department of Information and Computer Science at the University of California, Irvine. Professional experience includes a position at Raytheon, where he designed firmware in C and Ada for the German civilian air traffic control system (DERD) and for a prototype Microwave Airplane Landing System.






home | daily | current issue | archives | features | critical decisions | case studies | expert opinion | reviews | access | industry events | newsletter | research | advertising | subscribe | subscriber service | editorial calendar | press | contacts


Entire contents copyright 1996-2002 CMP Media LLC
Read our privacy policy.

www4