Assembling bits of history with git

The dpkg team has a nice history of changing VCS over time. At the beginning, Ian Jackson simply uploaded new tarballs, then CVS was used during a few years, then Arch got used and up to now Subversion was used. When the subversion repository got created, the arch history has not been integrated as somehow the conversion tools didn’t work.

Now we’re likely to move over git for various reasons and we wanted to get back the various bits of history stored in the different VCS. Unfortunately we lost the arch repository. So we have disjoints bits of history and we want to put them all in a single nice git branch… git comes with git-cvsimport, git-archimport and git-svnimport, so converting CVS/SVN/Arch repositories is relatively easy. But you end up with several repositories and several branches.

Git comes with a nice feature called “git rebase” which is able to replay history over another branch, but for this to work you need to have a common ancestor in the branch used for the rebase. That’s not the case… so let’s try to create that common ancestor! Extracting the first tree from the newest branch and committing it on top on the oldest branch will give that common ancestor because two identical trees will have the same identifier. Using git_load_dirs you can easily load a tree in your git repository, and “git archive” will let you extract the first tree too.

In short, let’s see how I attach the “master” branch of my “git-svn” repository to the “master” branch of my “git-cvs” repository:

$ cd git-svn
$ git-rev-list --all | tail -1
0d6ec86c5d05f7e60a484c68d37fb5fc31146c40
$ git-archive --prefix=dpkg-1.13.11/ 0d6ec86c5d05f7e60a484c68d37fb5fc31146c40 | (cd /tmp && tar xf -)
$ cd ../git-cvs
$ git checkout master
$ git_load_dirs -L"Fake commit to link SVN to older CVS history" /tmp/dpkg-1.13.11
[...]
$ git fetch ../git-svn master:svn
$ git checkout svn
$ git rebase master

That’s it, your svn branch now contains the old cvs history. Repeat as many times as necessary…

Additional Resources

Get the Debian Administrator's Handbook

After a successful liberation campaign, the Debian Administrator's Handbook is now freely available. If you appreciate my articles and what I do for Debian, check out the book and grab a copy.

Comments

  1. siprbaum says:

    You could also use a graft file (.git/info/grafts) where you could overwrite the parenthood of a commit. This mechanism is also used to connect the pre-git version of the linux kernel (bk import; <= 2.6.11) with the version managed in git.

    After that, you could also use git-filter-branch (AFAIK unreleased; but in the -rc version) to alter the history permant. (graft files are repo local).

  2. Anonymous says:

    git-cvsimport has known bugs, and frequently mangles history. I highly recommend checking its results carefully, or better yet not using it. Use parsecvs for a high-quality import.

  3. siprbaum: I knew of grafts file, and it’s convenient to use. But as you mentioned it’s repo-specific, a “git clone” won’t download the graft file. That’s why I went for the rebase approach.

    I’ll look into git-filter-branch once it’s in a stable release of git. There’s no hurry any more for me now… :-)

  4. For other people who wonder about parsecvs (like I just did), you can find that software here: http://gitweb.freedesktop.org/?p=users/keithp/parsecvs.git

  5. Note git-filter-branch is the same as the script in cogito called cg-admin-rewritehist, which is stable and extremely servicable.

    When it comes to doing history touch-ups and integrating history from lots of different sources, I’d call git-filter-branch indispensable. Get yourself a 1.5.3 RC version, or use the version of the tool in cogito.