sorting with expensive compares?

Steve Holden steve at
Fri Dec 23 19:42:04 CET 2005

bonono at wrote:
> Dan Stromberg wrote:
>>I've been using the following compare function, which in short checks, in
>>1) device number
>>2) inode number
>>3) file length
>>4) the beginning of the file
>>5) an md5 hash of the entire file
>>6) the entire file
> Why would #5 not enough as an indicator that the files are indentical ?
Because it doesn't guarantee that the files are identical. It indicates, 
to a very high degree of probability (particularly when the file lengths 
are equal), that the two files are the same, but it doesn't guarantee it.

Technically there are in infinite number of inputs that can produce the 
same md5 hash.

Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC           
PyCon TX 2006        

More information about the Python-list mailing list