a program to delete duplicate files
Patrick Useldinger
pu.news.001 at gmail.com
Sat Mar 12 11:58:15 EST 2005
François Pinard wrote:
> Identical hashes for different files? The probability of this happening
> should be extremely small, or else, your hash function is not a good one.
We're talking about md5, sha1 or similar. They are all known not to be
100% perfect. I agree it's a rare case, but still, why settle on
something "about right" when you can have "right"?
> I once was over-cautious about relying on hashes only, without actually
> comparing files. A friend convinced me, doing maths, that with a good
> hash function, the probability of a false match was much, much smaller
> than the probability of my computer returning the wrong answer, despite
> thorough comparisons, due to some electronic glitch or cosmic ray. So,
> my cautious attitude was by far, for all practical means, a waste.
It was not my only argument for not using hashed. My algorithm also does
less reads, for example.
-pu
More information about the Python-list
mailing list