OpenSolaris

  subsites:   Code Reviews   Gates   Issues   Defects   Polls   Test   PKG   Planet   Mail

OpenSolaris DSCM Evaluation: Mercurial (Interim Report)

A. Tool Information

Mercurial version 0.8.

B. Configuration Details

1. CPU architecture and machine specifics

kosh$ psrinfo -v
Status of virtual processor 0 as of: 03/30/2006 14:15:16
  on-line since 03/09/2006 16:44:59.
  The i386 processor operates at 1600 MHz,
        and has an i387 compatible floating point processor.
kosh$ uname -a
SunOS kosh 5.11 snv_35 i86pc i386 i86pc
    

2. Memory available

Memory size: 1024 Megabytes
    

3. Size of the repository (space)

TBD. I'm currently creating a repository from the 20060222 snapshot. However, before the end of testing we plan to create a repository that contains the current sources plus history back to the OpenSolaris Launch in June 2005.

4. Number of files in the repository

TBD. The 20060222 snapshot has 34,293 files.

5. Average number of deltas per file

TBD. 1 delta per file for the current test repository.

C. Evaluation Areas

1. Operation Functionality

(Refer to the Requirements Document for detail.)

e1. unbiased and disconnected distribution

Mercurial implements a model of independent repositories, though a repository can be configured to have a de-facto parent. The preferred model for propagating changes is to pull them from the child, rather than pushing them to the parent.

I believe that Mercurial supports updates between two repositories with a common ancestor, but I haven't tested this (TBD).

I believe that the "disconnected-use" requirements are all satisfied, though I haven't tested them (TBD).

e2. networked operation

As mentioned in the requirements document, Mercurial supports remote access via ssh.

e3. interface stability and completeness

storage

The storage representation appears to be well-documented. Certainly there's more information on the Mercurial website than we've been able to find for SCCS's storage representation.

Mercurial's storage representation does not use Unix i-numbers, so snapshots such as those provided by ZFS or Network Appliance filers should not cause problems.

At least some of the on-disk data structures do not appear to be versioned. This is a potential hazard. At least one storage representation change is planned: "RevlogNG", which is planned for Mercurial 0.9.

On-disk data structures are binary files, but I had no problems using the same repository from both SPARC and x86 systems. Binary files give improved performance, but if manual repairs are needed, we'll need a binary editing program.

Mercurial does not provide its own access control mechanism for controlling access to subtrees within a repository. While it might be possible to restrict user access to certain subtrees using filesystem ACLs, it would probably be better to use various pre-operation hooks (e.g., pretxnchangegroup) to implement that sort of control.

command-line interface, hooks

The command-line and hook interfaces appear to be adequately documented.

One nit: the current documentation appears to reflect the current development version of the code, rather than the most recent release[1]; there is nothing in the documentation to clarify what version it applies to. If OpenSolaris uses Mercurial, we may wish to place snapshots of the code and documentation on opensolaris.org to avoid confusion.

The hook infrastructure invokes the named hook(s) with a few tokens such as the changeset ID passed in via the environment. This means that the hook may need to invoke various Mercurial commands to find out more about the changeset. Presumably Mercurial's designers have thought about lock re-entrancy issues, but this should be verified. Also, it may not always be possible to get the desired information back from the existing Mercurial command-line interfaces. For example, "hg log" gives the old and new names of a renamed file, along with the names of any other files involved in the changeset, but it can't tell you that file "foo" was renamed to "bar".

At least one unexpected behavior was noted while testing: pushing a changeset from repository A to repository B caused A's commit hook to fire. If this is intentional, we'll need to spend some time to make sure we understand the behavior of the hooks that we want to depend on.

network protocol(s)

There is some documentation on the network protocol, though it's a bit sketchy. The protocol is versioned.

e4. standard operations and transactions

TBD (need to do more investigation). Rename support definitely needs work: merges don't track renames, and rename conflicts are not detected.

e5. per-changeset metadata

Mercurial associates a text comment with each changeset; this is added as part of the commit operation. The first line of the comment is displayed as a summary of the changeset for operations like "log", though the full comment can be displayed with the "-v" option. This is somewhat inconsistent with the current conventions for putbacks into ON; we'll want to think about what we want to do.

c6. ease of use

Although I haven't built Mercurial from source, my understanding is that it's pretty straightforward. It requires Python, but I don't believe it requires anything beyond a normal Python distribution.

The primary interface is the hg command, with subcommands for the various options. This is pretty standard.

The model is straightforward: you commit one or more changesets to the repository that you're working in, then you push or pull them to other repositories. Note that pushing a changset updates the target repository, but updating the target's source tree is a separate step ("hg update").

Mercurial offers subcommands specifically for generating and accepting source patches.

Mercurial supplies an HTTP server, as well. This can be used for browsing and for pulls over HTTP.

Support for backouts: the revert subcommand can be used to back out all changesets back to a particular revision. Backing out a changeset after the files have been subsequently modified is less straightforward. One suggestion is to generate a source patch for the changeset that you want to back out, then apply the patch using "patch -R".

By default, "hg status" lists files that aren't tracked in the repository (e.g., compiled binaries, editor backup files). This will generate an impossible level of noise in most real-life scenarios with ON. While doing "dmake clobber" will reduce the noise considerably[2], that is inconvenient for a tree the size of ON. The status subcommand does offer options to filter out noise, but it's not clear they can be used to give the desired results (show untracked source files and makefiles, but ignore all other untracked files).

Files should be imported with read-write permission. Mercurial keeps track of the permissions, and it complains if you try to update a read-only file (e.g., after a push or pull).

merging

Mercurial's default for resolving conflicts is the hgmerge script. This script checks for the presence of various third-party conflict resolver programs, such as tkdiff. At least some of these programs (e.g., tkdiff) offer functionality that is comparable to what is available with Teamware and Filemerge. It should be easy to add Filemerge support to hgmerge if that's desired.

If hgmerge can't find any of the expected conflict resolvers, it falls back to using diff(1) and patch(1). If patch rejects any part of the diffs, hgmerge invokes an editor so that the user can manually repair the conflict.

While first experimenting with Mercurial, I found it very easy to get my repository into a state where it would keep complaining about "outstanding uncommitted changes", but it was hard to figure out how to get out of that state. (Answer: use "hg update -C".) This probably needs to go into a FAQ.

mismerges

We'll want to think about possible changes to hgmerge to reduce the likelihood of mismerges.

First, if hgmerge uses patch, we may want to force a review of the changes, even if the patch applies cleanly.

Second, the code for invoking the editor and determining whether the conflict was resolved is a bit brittle[3]. This may just be a bug that needs fixing. But we may also want a more explicit "yes I have resolved the conflicts" action from the user (which is something Subversion does).

There is at least one open issue in the Mercurial bug database related to failed merges. Resolving this issue may address the brittleness problem mentioned above.

intermediary snapshots

The current ON convention is that putbacks should not introduce SCCS deltas for intermediary snapshots or Teamware merges. This is achieved by using the redelget and reedit subcommands in wx. Similar functionality is available with Mercurial using the Mercurial Queue (mq) extension.

c7. no-dedicated-server mode

Mercurial can run without any server daemons. ssh support is handled by starting a remote, transient Mercurial server automatically, which communicates with the local system over the ssh connection.

c8. tool community health

Mercurial has an active developer community. At least one developer (Bryan O'Sullivan) has helped with our evaluation of Mercurial, and he is interested in helping to address issues that we have run into so far (e.g., rename).

There have been a few problems with hgmerge, due to the age of Solaris's /bin/sh. These have typically been detected and repaired within a couple weeks. Also, at least one community member has noticed this pattern and suggested a more fundamental fix (rewriting hgmerge in Python).

c9. OpenSolaris community expertise

Mercurial is almost entirely written in Python; hgmerge is a shell script, though there is some talk of rewriting it in Python, too. It's unclear (to me) how much Python expertise there is in the OpenSolaris community. Fortunately, Python appears easy to learn, and the code tends to be more readable than, say, Perl. Commenting is a bit sparse, but I haven't had problems following the code.

c10. interface extensibility

Mercurial has a hooks mechanism as well as a documented extensions mechanism. Some hooks can abort the current operation.

c11. transactional operations and corruption recovery

Mercurial's state files are updating by appending. So presumably corrupted files can be repaired by rolling back to a consistent set of files.

Signal handling (e.g., SIGINT) appears sensible.

I simulated a crash using SIGKILL during a clone operation and was able to get the workspace into a state where there were many missing files. That is, the files were in the repository, but they were not in the visible source tree. This experiment raised a couple issues that we'll want to look at more carefully.

There also appear to be a couple open issues related to locking[4].

c12. content generality

Email discussions have indicated that Mercurial is supposed to support binary files as well as text. However, the merge code appears to assume text files.

o13. partial trees

Not supported.

o14. per-file histories

Changesets are for the entire repository, not per-file. But you can get a per-file history by specifying the file name with "hg log".

2. Storage

Formal evaluation is still TBD. Informal results are that Mercurial does not cause any storage usage spikes.

3. Performance

TBD (still collecting numbers). A local clone of the 20060222 tree takes a couple minutes on the above hardware using ZFS; about twice that on UFS.

D. Changes/Features Required/Desired

Must Have Initially

Want Eventually

Notes

[1] For example, the 2006-03-22 version of the hgrc(5) man page lists hooks that were apparently added after 0.8, although 0.8 is the most recent release.

[2] Besides leaving editor backup files, our clobber builds leave some generated files behind.

[3] The relevant code is

$EDITOR "$LOCAL" "$LOCAL.rej" && test -s "$LOCAL.rej" || exit 0
    

If $EDITOR can't be invoked for some reason, we'll take the "exit 0", which indicates a successful merge.

[4] issue132 "hg should revalidate its data after locking the repo" and issue154 "race between undo and all readers"

Last change: 2006-03-31 15:01 PST