1. Introduction Git is a distributed source code management tool. It was originally developed by Linus Torvalds, when he could no longer use BitKeeper to manage the Linux kernel sources. It is currently maintained by Junio C Hamano. It has an active community of users, and has been adapted by a few other open source projects. Git was designed to be low-level and fast. Higher-level interfaces are available, the most popular being Cogito, which offers a GUI. Git is open source, and distributed under the Gnu Public License, version 2. 2. Version used Initially, the intention was to use Git version 1.2.4. However, while remote repository tests seemed to work with this version, local repository tests did not: clone operations on local repositories failed to create a usable repository, and subsequent operations failed. The same problem was observed with version 1.2.3. The same problem was reported by someone else on the OpenSolaris tools-discuss mailinglists, noting that 1.2.2 did work. This was indeed true, and 1.2.2 was used for testing. It has to be noted that a remote repository created with 1.2.4 could not be accessed with 1.2.2, but recreating the repository fixed that problem. 3. Requirements This section will look at the requirements as listed in the Distributed Source Code Management Requirements, version 1.4. 3.1 E0 - Open source Git is open source, and available under the GPL, version 2. 3.2 E1 - Unbiased and disconnected distribution Yes, Git works in a distributed fashion, and updates between distinct repositories are possible. Synchronization between repositories is done via explicit push/pull operations. 3.3 E2 - Networked operation Networked operation is supported. Support for ssh connections is a builtin feature. 3.4 E3 - Interface stability and completeness The metadata storage data format is claimed to be stable, with only one incompatible change having occured in the past. The incompatibility between versions as noted in section 2 might point to problems in this area, but they have not been investigated further. 3.5 E4 - Standard operations and transactions Rename is supported, but at the metadata level this is a copy followed by a delete. History is not preserved. Deletion at the file and directory levels consists of removing the files at the filesystem level, and then performing a commit. Delete file, commit, create new file with the same name, commit, is a supported sequence of operations. After the file is re-created and committed in the final step, it inherits the history of the original file. A reverted deletion by user A, followed by a change to the same file by user B (from a repository update before the deletion) works. It does not appear to be possible to reference deleted files, except of course when inspecting differences between revisions. No equivalency errors were found during testing. 3.6 E5 - Per changeset metadata It is possible to attach metadata to a changeset, via the git tagging commands. 3.7 C6 - Ease of use Git is not hard to install. It required some modifications to the Makefile, but they weren't major. One issue is that it requires a recent version of GNU diff, with a -L (label) option. The diff executable name and flags are hardcoded in the C and shellscript source, and had to be changed. Also, the Python script assumes /usr/bin/python exists, and should use the Python setup mechanism instead. Git likes to have the rcs merge(1) command around, but it not being there isn't fatal. The low-level tools interface is inconsistent at times (long/short options, flags, like -n, having a different meaning for different commands). Git has a seperate command to maintain some repository state for a file, git-update-index, which updates the state for a file in the repository (before a commit). Its use is sometimes confusing, as some commands perform this operation themselves (sometimes depending on which flags they were passed), while at other times and explicit git-update-index is required. For example, the mv commands does do an implicit update, but the add command does not. 3.8 C7 "No dedicated server" operational mode Git does not require a dedicated server. 3.9 C8 - Tool community health The Git community is active, and the author actively interacts with users and developers on the primary Git mailinglist. I estimate that the author will be happy to take patches back, although currently, Git has a strong connection to the Linux community, which may take first place. 3.10 C9 - OpenSolaris community implementation expertise At least one Sun engineer in the OpenSolaris community is active in the Git community and has worked with it for a while. 3.11 C10 - Interface extensibility The following hooks are available and run if they are executables present in the hooks subdirectory of the git configuration directory: applypatch-msg pre-applypatch post-applypatch pre-commit post-commit update post-update 3.12 C11 - Transactional operations and corruption recovery I was unable to test this extensively, but the semantics do seem to be generally well-defined at the lowest repository level. There is an 'fsck' command to recover from a corrupted repository. 3.13 C12 - Content generality Binary files are supported. 3.14 O13 - Partial trees Partial trees are not supported 3.15 O14 - Per-file histories Per-file histories are not supported, but the Git core commands will extract the revisions that affected this file when asked for the history of a file. 4. Evaluation 4.1 Test hardware used psrinfo -v output: Status of virtual processor 0 as of: 03/31/2006 16:06:17 on-line since 02/23/2006 11:23:45. The sparcv9 processor operates at 1600 MHz, and has a sparcv9 floating point processor. Status of virtual processor 1 as of: 03/31/2006 16:06:17 on-line since 02/23/2006 11:23:42. The sparcv9 processor operates at 1600 MHz, and has a sparcv9 floating point processor. uname -a output: SunOS klomp 5.11 snv_31 sun4u sparc SUNW,Sun-Blade-2500 Tests were run on a local 122G ZFS filesystem. 4.2 Test results 4.2.1 Speed First commit (git add + git commit) of the OpenSolaris source tree: 1m40s Clone of a remote repository created by the above commands, from Menlo Park, CA, US to Amersfoort, Netherlands over SWAN: 6m35s This matches the linespeed usually achieved over this connection, given that the remote clone operation packs and compresses the repository when transferring it. Local clone of the same repository: 2m35s Local commit of one file in the repository: 9s 4.2.2 Conflict resolution A test harness was used to test the following conflict scenarios: * Two users each have a clone of a central repository. Both make a different change to the same line of the same file. Git correctly signaled this conflict, and directed the user to resolve this conflict by hand. * Three users each have a clone of a central repository. Both move the same files to different locations. A 3rd user renames one of the files in its original directory. All then do a commit and a push. Git correctly noticed the rename conflicts and provided a message with the full renamed paths, prompting the user to resolve the conflict. For the renamed files, Git appears to pull in the renamed files from the central repository, and undoes the rename in the local repository, after which the user has to resolve the conflict. The user isn't explicitly informed about this behavior. A problem with conflict resolution lies with the commit command. Normally, commit will not deal with added/deleted files that have not explicitly been marked as such. However, the -a option should deal with this, and is advertised as: "Update all paths in the index file. This flag notices files that have been modified and deleted, but new files you have not told git about are not affected." Several tutorials tell you to routinely use the -a option (they even seem to suggest always using it). However, commit -a will throw away any conflict information and will happily do a commit even there are unresolved conflicts, which is definitely not the desired result. 4.3 Source code The Git source code consists of C, Perl and Python source: 108 .c files (34277 lines), 23 .h files (1411 lines) 10 .perl files (4373 lines) 38 .sh files (5242 lines) 2 .py files (1219 lines) The coding style in the C files is fairly consistent, but comments are extremely sparse, so it can be hard to tell what's going on, especially if some functionality is also present in shell/perl or python files. This makes it harder for 3rd party contributors, and is inconsistent. 4.4 Filesystem usage: Master: 360M source, 125M repository Clone: 360M source, 88M repository 6. Conclusions The original goal for Git was to be fast. It does seem to be pretty fast for several of its operations. It also has an active an enthusiastic community, which gives it momentum. The downsides are: * Needing to go two versions back to find a version that worked for some very basic operations (e.g. creating and cloning a repository) is not good. * The source code is inconsistent in places (language it's written in), and needs much documentation. It also has a lot of hardcoded names in it (diff command and flags, hooknames). * Documentation is available for all commands, but it can be sparse. * Commit -a should not throw away conflict information. * The update-index command seems counter-intuitive and inconsistently used amongst the git core commands. * The flags are sometimes inconsistent from command to command.