md5 and large files

Josiah Carlson jcarlson at
Mon Oct 18 20:30:49 CEST 2004

> Of course I can't... *grin* -- But actually, (correct me if I'm wrong) an 
> MD5 sum is 128 bits long, that are 2^128 different possibilities. Now a 2 
> GB file has 8*2*1024^3 bits, that are 2^17179869184 different 
> possibilities for a 2 GB file. Am I wright thinking that the number of 
> files with an identical md5 sum is now 2^17179869184 / 2^128 = 
> 2^17179869056?

(I know this is a major simplification, but bear with me)

Your probability and numbers are correct.  The problem is that 2^128 is
so damn huge, if a billion computers running at a billion md5s/sec were
all running, it would still take ~10,790,283,070,806 years to see all
md5 digests, assuming that every attempt produces a unique digest (until
we have actually seen them all).

Without a specific attack on md5 (which there is one), the probability
of choosing two arbitrary strings of K characters each, that hash to the
same value with md5, is around 1 in 2^127.  More precisely, you are more
likely to get hit by lightning while being a major candidate for the US
presidential election, than to discover such a pair.

 - Josiah

More information about the Python-list mailing list