So, sha-1 is looking increasingly insecure in applications where birthday attacks are possible. ("Birthday attacks" ... what a phrase ... I hope my non-technical readers stopped at "sha-1".)

Two things about that:


First, I wanted to mention that I've today released jetring 0.15, which adds support for arbitrary hashes in the index file, and deprecates use of sha-1, going to sha-256 by default. There is a jetring-checksum -u utility that can be used to upgrade sha-1 hashes in existing jetring index files.

If you're using jetring in an application where changesets are provided by third paries, then a birthday attack could be possible (though not easy?), and you should upgrade your index. debian-maintainer is a good example of such a jetring user.


Secondly, our beloved git uses sha-1, and this seems unlikely to change soon or without significant pain. So, what kinds of collision attacks would you need to watch out for when using git? Here is a real-world example I've been pondering. Is it accurate?

Here's a different scenario..

This seems more plausible. This sort of attack is easy to accomplish with a subversion repository, and was one of the reasons I was glad to switch to git, since its checksums and signed tags seemed to prevent this kind of mischief. So, worrying, especially if your project uses such a build server.

Update: Thanks to the commenters for helping me correct my example. (I hope!)

Git, files with same sha-1
The first thing that strikes my mind when thinking about git and sha-1 is not security (that's only #2), but whether git will break when the user wants to keep two files with the same checksum under version control. If I was a security researcher and stuyding sha-1 birthday attacks, I'd want to keep the files under version control...
Comment by liw.fi at midnight, May 7th, 2009
comment 2

It shouldn't matter if your research files have the same sha-1, because you'd commit file A and B with different commit messages, parents, and commit times. Since git checksums all of those, in addition to the file content, it should not care.

There might potentially be problems with anything in the working copy/index code that looks at sha1's of files getting confused if you replace file A with B, and thinking you haven't modified it. But once they're committed, the checksums will differ.

Comment by joey [kitenet.net] late Wednesday night, May 7th, 2009
comment 3
An attacker doesn't need to worry about commit objects. Each blob is identified by a SHA1 of its contents only, not even including its filename or permissions - the latter are part of the tree object. The security researcher case is a classic example of how pure hash-based content addressing schemes fail, including git's.
Comment by fanf [livejournal.com] late Wednesday night, May 7th, 2009
@c1 and c2
Since git first stores the raw file contents as blobs, addressed only by the content sha-1 (Which is something like SHA1(TYPE+LENGHT+BYTESTRING)). Since the colliding files may collide in the bytestrings, but possibly don't collide with the git prefix (which might be only 2-4 bytes long), the files are probably safe for storage with git. However, if the blob id WOULD collide, git would conclude the files were identical, and only one version of the blob and thus the file contents would have been stored. ulrik
Comment by kaizer.se in the wee hours of Wednesday night, May 7th, 2009
@c2
Basically commit ID (commit sha-1) does not matter, since commits only reference trees which in turn reference blobs by id. The collisions in git are possible for blobs, but only if they collide using git-hash-object, which I wanted to say in the last comment.
Comment by kaizer.se in the wee hours of Wednesday night, May 7th, 2009
birthday attack?

hi, i am not familiar with 'the birthday attack on sha-1'.

if i assume the nature of the birthday-attack right, it is based upon the probability of having two random pieces of data hash to the same value.

how could you use that to produce an piece of data which hashvalue matches that of an determined piece of data? isn't that another problem-set?

Comment by dmk [getopenid.com] mid-morning Sunday, May 10th, 2009
birthday attack!

ah! i just got it, rereading your blogentry.

you have the good file and a bad file. and then you modify both files randomly (via a comment or somesuch) until you get the same sha-1.

i can imagine smth like this working. but of course the 2^52 is probably still only theoretically broken? don't know what effort such an attack would take. (like how much "1000 dollar worth of hardware running one day")

Comment by dmk [getopenid.com] late Sunday morning, May 10th, 2009
2^52

This was posted about the 2^52 on ietf-openpgp

Just to give you some perspective what WFO means at this day and age: my cryptography lab at the University has just built and tested a DES cracker that cost us less than €20000 EUR. It iterates through the 56-bit key space in about one week. We are considering using it for finding a SHA1 collision using these new results. But, as noted above, this would be a collision where both pre-images are carefully chosen by the attacker.
Comment by joey [kitenet.net] Sunday afternoon, May 10th, 2009
Add a comment