Steven D'Aprano steven at
Wed Apr 15 11:03:16 CEST 2009

On Wed, 15 Apr 2009 07:54:20 +0200, Martin wrote:

>> Perhaps I'm being dim, but how else are you going to decide if two
>> files are the same unless you compare the bytes in the files?
> I'd say checksums, just about every download relies on checksums to
> verify you do have indeed the same file.

The checksum does look at every byte in each file. Checksumming isn't a 
way to avoid looking at each byte of the two files, it is a way of 
mapping all the bytes to a single number.

>> You could hash them and compare the hashes, but that's a lot more work
>> than just comparing the two byte streams.
> hashing is not exactly much mork in it's simplest form it's 2 lines per
> file.

Hashing is a *lot* more work than just comparing two bytes. The MD5 
checksum has been specifically designed to be fast and compact, and the 
algorithm is still complicated:

The reference implementation is here:

SHA-1 is even more complicated still:

Just because *calling* some checksum function is easy doesn't make the 
checksum function itself simple. They do a LOT more work than just a 
simple comparison between bytes, and that's totally unnecessary work if 
you are making a one-off comparison of two local files.


