binary file comparison with the md5 module

Christian Reyes christian at rocketnetwork.com
Wed Jun 13 14:05:01 EDT 2001


I'm trying to write a script that takes two binary files and returns whether
or not their data is completely matching.

One of my peers suggested that an efficient way to do this would be to run
the md5 algorithm on each file and then compare the resultant output.  Since
md5 returns a unique 128-bit checksum of it's input, this should
theoretically work.

The problem i'm having is with reading the binary file in as a string.

I tried opening the file with the built-in python open command, and then
reading the contents of the file into a buffer.  But I think my problem is
that when I read the binary file into a buffer, the contents get tweaked
somehow.  I would expect the print statement to give me some huge string of
gibberish but instead what I get is 'RIFFnap'.  Regardless of what size the
file is.  I'll try to read in a 5 meg file and all I get when I try to print
the buffer is some variation of 'RIFFxxx' (where xxx is any arbitrary set of
3 characters).

>>> x = open('d:\\binary.wav')
>>> buf = x.read()
>>> print buf
'RIFFnap'

Anyway, if any of you have a better suggestion for me, I'd really appreciate
it.

Basically all i'm looking for is an efficient method of comparing binary
data files.

Thanks for your time,
christian





More information about the Python-list mailing list