a program to delete duplicate files

John J. Lee jjl at pobox.com
Mon Mar 14 05:34:57 EST 2005


Patrick Useldinger <pu.news.001 at gmail.com> writes:
> John Machin wrote:
[...]
> > 2. As others have explained, with a decent hash function, the
> > probability of a false positive is vanishingly small. Further, nobody
[...]
> Still, if you can get it 100% right automatically, why would you
> bother checking manually? Why get back to argments like "impossible",
> "implausible", "can't be" if you can have a simple and correct answer
> -
> yes or no?
[...]

Well, as Francois pointed out, it is strictly not physically possible
to obtain a perfectly reliable answer, even if you *do* do the
comparison.

Even so, you're right on this point (though IIUC it's not practically
important ATM): regardless of wild flukes, people can deliberately
wangle files to get a hash collision. so comparison is better than
hashing from this PoV.


John



More information about the Python-list mailing list