# md5 and large files

Josiah Carlson jcarlson at uci.edu
Mon Oct 18 20:30:49 CEST 2004

```> Of course I can't... *grin* -- But actually, (correct me if I'm wrong) an
> MD5 sum is 128 bits long, that are 2^128 different possibilities. Now a 2
> GB file has 8*2*1024^3 bits, that are 2^17179869184 different
> possibilities for a 2 GB file. Am I wright thinking that the number of
> files with an identical md5 sum is now 2^17179869184 / 2^128 =
> 2^17179869056?

(I know this is a major simplification, but bear with me)

Your probability and numbers are correct.  The problem is that 2^128 is
so damn huge, if a billion computers running at a billion md5s/sec were
all running, it would still take ~10,790,283,070,806 years to see all
md5 digests, assuming that every attempt produces a unique digest (until
we have actually seen them all).

Without a specific attack on md5 (which there is one), the probability
of choosing two arbitrary strings of K characters each, that hash to the
same value with md5, is around 1 in 2^127.  More precisely, you are more
likely to get hit by lightning while being a major candidate for the US
presidential election, than to discover such a pair.

- Josiah

```