binary file compare...

norseman norseman at hughes.net
Fri Apr 17 17:59:27 CEST 2009


Adam Olsen wrote:
> On Apr 16, 11:15 am, SpreadTooThin <bjobrie... at gmail.com> wrote:
>> And yes he is right CRCs hashing all have a probability of saying that
>> the files are identical when in fact they are not.
> 
> Here's the bottom line.  It is either:
> 
> A) Several hundred years of mathematics and cryptography are wrong.
> The birthday problem as described is incorrect, so a collision is far
> more likely than 42 trillion trillion to 1.  You are simply the first
> person to have noticed it.
> 
> B) Your software was buggy, or possibly the input was maliciously
> produced.  Or, a really tiny chance that your particular files
> contained a pattern that provoked bad behaviour from MD5.
> 
> Finding a specific limitation of the algorithm is one thing.  Claiming
> that the math is fundamentally wrong is quite another.
> --
> http://mail.python.org/mailman/listinfo/python-list
> 
================================
Spending a lifetime in applied math has taught me:
	1) All applied math is finite.
	2) Any algorithm failing to handle all contingencies is flawed.

The meaning of 1) is that it is limited in what it can actually do.
The meaning of 2) is that the designer missed or left out something.

Neither should be taken as bad. Both need to be accepted 'as 'is' and 
the decision to use (when,where,conditions) based on the probability of 
non-failure.


"...a pattern that provoked bad behavior... " does mean the algorithm is 
incomplete and may be fundamentally wrong. Underscore "is" and "may".

The more complicated the math the harder it is to keep a higher form of 
math from checking (or improperly displacing) a lower one.  Which, of 
course, breaks the rules.  Commonly called improper thinking. A number 
of math teasers make use of that.



Steve



More information about the Python-list mailing list