binary file compare...

Adam Olsen rhamph at
Wed Apr 15 20:37:41 CEST 2009

On Apr 15, 11:04 am, Nigel Rantor <wig... at> wrote:
> The fact that two md5 hashes are equal does not mean that the sources
> they were generated from are equal. To do that you must still perform a
> byte-by-byte comparison which is much less work for the processor than
> generating an md5 or sha hash.
> If you insist on using a hashing algorithm to determine the equivalence
> of two files you will eventually realise that it is a flawed plan
> because you will eventually find two files with different contents that
> nonetheless hash to the same value.
> The more files you test with the quicker you will find out this basic truth.
> This is not complex, it's a simple fact about how hashing algorithms work.

The only flaw on a cryptographic hash is the increasing number of
attacks that are found on it.  You need to pick a trusted one when you
start and consider replacing it every few years.

The chance of *accidentally* producing a collision, although
technically possible, is so extraordinarily rare that it's completely
overshadowed by the risk of a hardware or software failure producing
an incorrect result.

More information about the Python-list mailing list