binary file comparison with the md5 module
christian at rocketnetwork.com
Wed Jun 13 20:22:01 CEST 2001
after some more research i have discovered the very handy "filecmp" module.
"Christian Reyes" <christian at rocketnetwork.com> wrote in message
news:9g8ahr$s6t$1 at bob.news.rcn.net...
> I'm trying to write a script that takes two binary files and returns
> or not their data is completely matching.
> One of my peers suggested that an efficient way to do this would be to run
> the md5 algorithm on each file and then compare the resultant output.
> md5 returns a unique 128-bit checksum of it's input, this should
> theoretically work.
> The problem i'm having is with reading the binary file in as a string.
> I tried opening the file with the built-in python open command, and then
> reading the contents of the file into a buffer. But I think my problem is
> that when I read the binary file into a buffer, the contents get tweaked
> somehow. I would expect the print statement to give me some huge string
> gibberish but instead what I get is 'RIFFnap'. Regardless of what size
> file is. I'll try to read in a 5 meg file and all I get when I try to
> the buffer is some variation of 'RIFFxxx' (where xxx is any arbitrary set
> 3 characters).
> >>> x = open('d:\\binary.wav')
> >>> buf = x.read()
> >>> print buf
> Anyway, if any of you have a better suggestion for me, I'd really
> Basically all i'm looking for is an efficient method of comparing binary
> data files.
> Thanks for your time,
More information about the Python-list