binary file comparison with the md5 module
Fredrik Lundh
fredrik at pythonware.com
Wed Jun 13 15:02:04 EDT 2001
Christian Reyes wrote
> I'm trying to write a script that takes two binary files and returns whether
> or not their data is completely matching.
>
> One of my peers suggested that an efficient way to do this would be to run
> the md5 algorithm on each file and then compare the resultant output.
if you're comparing two binary files, that's not very efficient -- you
really don't have to read the *entire* file to figure out if there's any
differences...
a better solution is to start by comparing the sizes (if they're different,
the files cannot possible have the same content), and then read same-
sized chunks from both files. as soon as two chunks differ, the files are
different.
the filecmp module implements this scheme:
import filecmp
if filecmp.cmp(file1, file2, shallow=0):
print "same contents"
(the shallow=0 flag makes sure that filecmp.cmp checks the contents
even if the size and modification time attributes happens to match)
> I tried opening the file with the built-in python open command, and then
> reading the contents of the file into a buffer. But I think my problem is
> that when I read the binary file into a buffer, the contents get tweaked
> somehow.
>
> >>> x = open('d:\\binary.wav')
if you double-check the docs (look for "open" under builtin functions
in the library reference), you'll notice that Python opens files in text
mode by default.
to open a binary file, add "rb" as the second argument to open:
> >>> x = open('d:/binary.wav', 'rb')
hope this helps!
</F>
More information about the Python-list
mailing list