Shorter checksum than MD5
elbertlev at hotmail.com
Fri Sep 10 14:27:30 CEST 2004
> For your application, you should consider the total number of records
> you ever expect to have, and use more than 2 * lg(records) bits of hash.
> Due to the so-called "birthday paradox", when you have N possible hash
> values, two will be identical with 50% probability with around sqrt(N)
> items. You'd probably prefer that the probability be much lower in your
> application, since a collision will result in incorrect results.
Wrong! "birthday paradox" is not applicable here.
If you want an analogy with this combinatorial problem, imagine 2 rows with N
objects in each, There exists a "measure" of each object.
Some objects can be modified with probability 1/2**32
the measure will not change after modification.
Objects in the SAME POSITION in each row are compared by comparing their measures.
After M objects are modified what is the probability that
at least one modification will be "missed" by the comparison process.
I don't think, that in the foreseen future (if M and N are not too high)
such collision will occur.
More information about the Python-list