Why checksum? [was Re: Fuzzy Lookups]

Steven D'Aprano steve at REMOVETHIScyber.com.au
Tue Jan 31 16:28:17 EST 2006

On Tue, 31 Jan 2006 10:51:44 -0500, Gregory Piñero wrote:

> http://www.blendedtechnologies.com/removing-duplicate-mp3s-with-python-a-naive-yet-fuzzy-approach/60
> If anyone would be kind enough to improve it I'd love to have these
> features but I'm swamped this week!
> - MD5 checking for find exact matches regardless of name 
> - Put each set of duplicates in its own subfolder.

This isn't a criticism, it is a genuine question. Why do people compare
local files with MD5 instead of doing a byte-to-byte compare? Is it purely
a caching thing (once you have the checksum, you don't need to read the
file again)? Are there any other reasons?


More information about the Python-list mailing list