binary file compare...
Steven D'Aprano
steven at REMOVE.THIS.cybersource.com.au
Wed Apr 15 05:03:16 EDT 2009
On Wed, 15 Apr 2009 07:54:20 +0200, Martin wrote:
>> Perhaps I'm being dim, but how else are you going to decide if two
>> files are the same unless you compare the bytes in the files?
>
> I'd say checksums, just about every download relies on checksums to
> verify you do have indeed the same file.
The checksum does look at every byte in each file. Checksumming isn't a
way to avoid looking at each byte of the two files, it is a way of
mapping all the bytes to a single number.
>> You could hash them and compare the hashes, but that's a lot more work
>> than just comparing the two byte streams.
>
> hashing is not exactly much mork in it's simplest form it's 2 lines per
> file.
Hashing is a *lot* more work than just comparing two bytes. The MD5
checksum has been specifically designed to be fast and compact, and the
algorithm is still complicated:
http://en.wikipedia.org/wiki/MD5#Pseudocode
The reference implementation is here:
http://www.fastsum.com/rfc1321.php#APPENDIXA
SHA-1 is even more complicated still:
http://en.wikipedia.org/wiki/SHA_hash_functions#SHA-1_pseudocode
Just because *calling* some checksum function is easy doesn't make the
checksum function itself simple. They do a LOT more work than just a
simple comparison between bytes, and that's totally unnecessary work if
you are making a one-off comparison of two local files.
--
Steven
More information about the Python-list
mailing list