binary file compare...
davea at ieee.org
Tue Apr 14 03:11:28 CEST 2009
> On Apr 13, 2:37 pm, Grant Edwards <invalid at invalid> wrote:
>> On 2009-04-13, Grant Edwards <invalid at invalid> wrote:
>>> On 2009-04-13, SpreadTooThin <bjobrie... at gmail.com> wrote:
>>>> I want to compare two binary files and see if they are the same.
>>>> I see the filecmp.cmp function but I don't get a warm fuzzy feeling
>>>> that it is doing a byte by byte comparison of two files to see if they
>>>> are they same.
>>> Perhaps I'm being dim, but how else are you going to decide if
>>> two files are the same unless you compare the bytes in the
>>> You could hash them and compare the hashes, but that's a lot
>>> more work than just comparing the two byte streams.
>>>> What should I be using if not filecmp.cmp?
>>> I don't understand what you've got against comparing the files
>>> when you stated that what you wanted to do was compare the files.
>> Doh! I misread your post and thought were weren't getting a
>> warm fuzzying feeling _because_ it was doing a byte-byte
>> compare. Now I'm a bit confused. Are you under the impression
>> it's _not_ doing a byte-byte compare? Here's the code:
>> def _do_cmp(f1, f2):
>> bufsize =UFSIZE
>> fp1 =pen(f1, 'rb')
>> fp2 =pen(f2, 'rb')
>> while True:
>> b1 =p1.read(bufsize)
>> b2 =p2.read(bufsize)
>> if b1 !=2:
>> return False
>> if not b1:
>> return True
>> It looks like a byte-by-byte comparison to me. Note that when
>> this function is called the file lengths have already been
>> compared and found to be equal.
>> Grant Edwards grante Yow! Alright, you!!
>> at Imitate a WOUNDED SEAL
>> visi.com pleading for a PARKING
> I am indeed under the impression that it is not always doing a byte by
> byte comparison...
> as well the documentation states:
> Compare the files named f1 and f2, returning True if they seem equal,
> False otherwise.
> That word... Seeeeem... makes me wonder.
> Thanks for the code! :)
Some of this discussion depends on the version of Python, but didn't say
so. In version 2.61, the code is different (and more complex) than
what's listed above. The docs are different too. In this version, at
least, you'll want to explicitly pass the shallow=False parameter. It
defaults to 1, by which they must mean True. I think it's a bad
default, but it's still a useful function. Just be careful to include
that parameter in your call.
Further, you want to check the version included with your version. The
file filecmp.py is in the Lib directory, so it's not trouble to check it.
More information about the Python-list