binary file compare...

Martin martin at
Wed Apr 15 15:25:27 CEST 2009

On Wed, Apr 15, 2009 at 11:03 AM, Steven D'Aprano
<steven at> wrote:
> The checksum does look at every byte in each file. Checksumming isn't a
> way to avoid looking at each byte of the two files, it is a way of
> mapping all the bytes to a single number.

My understanding of the original question was a way to determine
wether 2 files are equal or not. Creating a checksum of 1-n files and
comparing those checksums IMHO is a valid way to do that. I know it's
a (one way) mapping between a (possibly) longer byte sequence and
another one, how does checksumming not take each byte in the original
sequence into account.

I'd still say rather burn CPU cycles than development hours (if I got
the question right), if not then with binary files you will have to
find some way of representing differences between the 2 files in a
readable manner anyway.

> Hashing is a *lot* more work than just comparing two bytes. The MD5
> checksum has been specifically designed to be fast and compact, and the
> algorithm is still complicated:

I know that the various checksum algorithms aren't exactly cheap, but
I do think that just to know wether 2 files are different a solution
which takes 5mins to implement wins against a lengthy discussion which
optimizes too early wins hands down.



You are not free to read this message,
by doing so, you have violated my licence
and are required to urinate publicly. Thank you.

Please avoid sending me Word or PowerPoint attachments.

More information about the Python-list mailing list