Shorter checksum than MD5

Paul Rubin http
Fri Sep 10 09:54:02 CEST 2004

Paul Rubin <> writes:
> > If all 2**32 checksums are equally likely, the probability of a
> > collision is only about 0.0000232828.
> That's incorrect, the probability is much higher.  It's more like 0.7.

I should be clear: the claim was, if you have 100000 records and
32-bit hash, with probability around 0.7, some two records will have
the same hash.  That makes some uses of the hashes inappropriate.

However, if you only care about record #n on system A accidentally
colliding with record #n on system B, rather than about it colliding
with an arbitrary other record, then you're in much better shape.

If you're worried about intentional collisions that an attacker might
construct, rather than only about accidental collisions, you need a
longer hash regardless of the number of records.

More information about the Python-list mailing list