binary file compare...

SpreadTooThin bjobrien62 at gmail.com
Wed Apr 15 13:26:18 EDT 2009


On Apr 15, 8:04 am, Grant Edwards <invalid at invalid> wrote:
> On 2009-04-15, Martin <mar... at marcher.name> wrote:
>
>
>
> > Hi,
>
> > On Mon, Apr 13, 2009 at 10:03 PM, Grant Edwards <invalid at invalid> wrote:
> >> On 2009-04-13, SpreadTooThin <bjobrie... at gmail.com> wrote:
>
> >>> I want to compare two binary files and see if they are the same.
> >>> I see the filecmp.cmp function but I don't get a warm fuzzy feeling
> >>> that it is doing a byte by byte comparison of two files to see if they
> >>> are they same.
>
> >> Perhaps I'm being dim, but how else are you going to decide if
> >> two files are the same unless you compare the bytes in the
> >> files?
>
> > I'd say checksums, just about every download relies on checksums to
> > verify you do have indeed the same file.
>
> That's slower than a byte-by-byte compare.
>
> >> You could hash them and compare the hashes, but that's a lot
> >> more work than just comparing the two byte streams.
>
> > hashing is not exactly much mork in it's simplest form it's 2
> > lines per file.
>
> I meant a lot more CPU time/cycles.
>
> --
> Grant Edwards                   grante             Yow! Was my SOY LOAF left
>                                   at               out in th'RAIN?  It tastes
>                                visi.com            REAL GOOD!!

I'd like to add my 2 cents here.. (Thats 1.8 cents US)
All I was trying to get was a clarification of the documentation of
the cmp method.
It isn't clear.

byte by byte comparison is good enough for me as long as there are no
cache issues.
a check sum is not good because it doesn't guarantee that  1 + 2 + 3
== 3 + 2 + 1
a crc of any sort is more work than a byte by byte comparison and
doesn't give you any more information.





More information about the Python-list mailing list