binary file compare...
Peter Otten
__peter__ at web.de
Mon Apr 13 17:25:37 EDT 2009
Grant Edwards wrote:
> On 2009-04-13, Grant Edwards <invalid at invalid> wrote:
>> On 2009-04-13, SpreadTooThin <bjobrien62 at gmail.com> wrote:
>>
>>> I want to compare two binary files and see if they are the same.
>>> I see the filecmp.cmp function but I don't get a warm fuzzy feeling
>>> that it is doing a byte by byte comparison of two files to see if they
>>> are they same.
>>
>> Perhaps I'm being dim, but how else are you going to decide if
>> two files are the same unless you compare the bytes in the
>> files?
>>
>> You could hash them and compare the hashes, but that's a lot
>> more work than just comparing the two byte streams.
>>
>>> What should I be using if not filecmp.cmp?
>>
>> I don't understand what you've got against comparing the files
>> when you stated that what you wanted to do was compare the files.
>
> Doh! I misread your post and thought were weren't getting a
> warm fuzzying feeling _because_ it was doing a byte-byte
> compare. Now I'm a bit confused. Are you under the impression
> it's _not_ doing a byte-byte compare? Here's the code:
>
> def _do_cmp(f1, f2):
> bufsize = BUFSIZE
> fp1 = open(f1, 'rb')
> fp2 = open(f2, 'rb')
> while True:
> b1 = fp1.read(bufsize)
> b2 = fp2.read(bufsize)
> if b1 != b2:
> return False
> if not b1:
> return True
>
> It looks like a byte-by-byte comparison to me. Note that when
> this function is called the file lengths have already been
> compared and found to be equal.
But there's a cache. A change of file contents may go undetected as long as
the file stats don't change:
$ cat fool_filecmp.py
import filecmp, shutil, sys
for fn in "adb":
with open(fn, "w") as f:
f.write("yadda")
shutil.copystat("d", "a")
filecmp.cmp("a", "b", False)
with open("a", "w") as f:
f.write("*****")
shutil.copystat("d", "a")
if "--clear" in sys.argv:
print "clearing cache"
filecmp._cache.clear()
if filecmp.cmp("a", "b", False):
print "file a and b are equal"
else:
print "file a and b differ"
print "a's contents:", open("a").read()
print "b's contents:", open("b").read()
$ python2.6 fool_filecmp.py
file a and b are equal
a's contents: *****
b's contents: yadda
Oops. If you are paranoid you have to clear the cache before doing the
comparison:
$ python2.6 fool_filecmp.py --clear
clearing cache
file a and b differ
a's contents: *****
b's contents: yadda
Peter
More information about the Python-list
mailing list