I've used unison for a long while for keeping things like my music in sync between machines. But it's never felt entirely safe, or right. (Or fast!) Using a VCS would be better, but would consume a lot more space.

Well, space still matters on laptops, with their smallish SSDs, but I have terabytes of disk on my file servers, so VCS space overhead there is no longer of much concern for files smaller than videos. So, here's a way I've been experimenting with to get rid of unison in this situation.

    git clone --shared /mnt/fileserver/stuff.git stuff

Caveats:

(Thanks to Ted T'so for the hint about using --shared, which makes this work significantly better, and simpler.)

You can avoid the links.
Check out the --git-dir and --work-tree options to git. You should be able to use those to avoid your links...
Comment by ejr [claimid.com] Wednesday afternoon, January 21st, 2009
Another way of avoiding the hardlinks...

When you do the clone, use git clone -s. This sets up the file $GIT_DIR/objects/info/alternates to point at the objects directory of the base repository. It means that git will look for objects in the base repository if they can't be found in the clone directory. That way you can do git gc without worrying about breaking the hard links.

For example, I normally keep /usr/projects/linux/base as a clone of Linus's linux repository. I do my local hacking in various repository that are cloned using "git clone base ext4". This causes /usr/projects/linux/ext4/.git/objects/info/alternates to contain the single line "/usr/projects/linux/base/objects".... and the objects directory is otherwise completely empty. When I make commits into the ext4 repository, those objects are created in /usr/projects/linux/ext4/.git/objects; and then when I push them to the base directory, a copy is made there. Afterwards, if I do a "git gc" in /usr/projects/linux/ext4, those objects will eventually disappear. (There is an expiry time for safety reasons, so they won't disappear right away, unless you explicitly prune the reflog via "git reflog expire --expire=0 --expire-unreachable=0 --all; git gc; git prune" --- why this is so is beyond the scope of this comment, though. :-)

In any case, the advantage of using "git clone -s" is that git gc is safe; you don't have to worry about breaking hard links and causing the disk usage to explode. The downside is there is only one copy of the objects, so if you do have local hard disk corruption the savings in disk space using also makes your system slightly less robust against random disk-induced data loss.

I do like the idea of experimenting with using git as a replacement for Unison. One potential problem which does come to mind is that git doesn't preserve file permissions, which might be an issue in some cases...

Comment by tytso [livejournal.com] Wednesday evening, January 21st, 2009
Avoiding clones alotgether

There is a script in git's contrib directory, called something like git-new-workdir. It will create a valid .git dir, that symlinks everything to the source. That way you don't get duplicate objects even if you commit in this new workdir. However, you have to be careful to check out different branch in each of such new workdirs, since git has no way to know what other workdirs exist and update their working tree content if you commit to branch they have checked out. This is also the reason why it's a contrib script and not an official command.

Comment by drak.ucw.cz/~bulb// in the wee hours of Wednesday night, January 22nd, 2009
pack.packSizeLimit
You can use the git configuration variable "pack.packSizeLimit" to keep the packfiles small enough to handle.
Comment by dmarti [myopenid.com] Tuesday afternoon, October 27th, 2009
comment 5

I've also started making a .gitalternates files that contains something like: * -delta

That should speed up commits and/or packs by avoiding it trying delta compression. Have not used it long enough (and it's not documented) to know for sure where/how it helps.

Comment by joey [kitenet.net] late Tuesday evening, October 27th, 2009
comment 6

Note that when committing from a shared clone, you do end up with locally written objects. I clear these out by running something like this:

git push && rm -vf $MR_REPO/.git/objects/??/*

It's "safe" to delete the local objects once they're pushed to the server.

Comment by joey [kitenet.net] late Tuesday evening, October 27th, 2009
Add a comment