[Tutor] hashlib weirdness

Terry Carroll carroll at tjc.com
Mon Apr 2 19:45:38 CEST 2007


On 30 Mar 2007, Greg Perry wrote:

> Here's one that has me stumped.
> 
> I am writing a forensic analysis tool that takes either a file or a
> directory as input, then calculates a hash digest based on the contents
> of each file.
> 
> I have created an instance of the hashlib class:
> 
> m = hashlib.md5()
> 
> I then load in a file in binary mode:
> 
> f = open("c:\python25\python.exe", "rb")
> 
> According to the docs, the hashlib update function will update the hash
> object with the string arg.  So:
> 
> m.update(f.read())
> m.hexdigest()
> 
> The md5 hash is not correct for the file.

Odd.  It's correct for me:

In Python:

>>> import hashlib
>>> m = hashlib.md5()
>>> f = open("c:\python25\python.exe", "rb")
>>> m.update(f.read())
>>> m.hexdigest()
'7e7c8ae25d268636a3794f16c0c21d7c'

Now, check against the md5 as calculated by the md5sum utility:

>md5sum c:\Python25\python.exe
\7e7c8ae25d268636a3794f16c0c21d7c *c:\\Python25\\python.exe


> f.seek(0)
> hashlib.md5(f.read()).hexdigest()

No difference here:

>>> f.close()
>>> f = open("c:\python25\python.exe", "rb")
>>> hashlib.md5(f.read()).hexdigest()
'7e7c8ae25d268636a3794f16c0c21d7c'



More information about the Tutor mailing list