binary file comparison with the md5 module
Remco Gerlich
scarblac at pino.selwerd.nl
Thu Jun 14 02:30:02 EDT 2001
Christian Reyes <christian at rocketnetwork.com> wrote in comp.lang.python:
> I'm trying to write a script that takes two binary files and returns whether
> or not their data is completely matching.
>
> One of my peers suggested that an efficient way to do this would be to run
> the md5 algorithm on each file and then compare the resultant output. Since
> md5 returns a unique 128-bit checksum of it's input, this should
> theoretically work.
>
> The problem i'm having is with reading the binary file in as a string.
>
> I tried opening the file with the built-in python open command, and then
> reading the contents of the file into a buffer. But I think my problem is
> that when I read the binary file into a buffer, the contents get tweaked
> somehow. I would expect the print statement to give me some huge string of
> gibberish but instead what I get is 'RIFFnap'. Regardless of what size the
> file is. I'll try to read in a 5 meg file and all I get when I try to print
> the buffer is some variation of 'RIFFxxx' (where xxx is any arbitrary set of
> 3 characters).
>
> >>> x = open('d:\\binary.wav')
You need to open the file in binary mode:
x = open("d:\\binary.wav", "rb")
--
Remco Gerlich
More information about the Python-list
mailing list