md5 for large files

Erik Max Francis max at alcyone.com
Wed Mar 5 17:54:45 EST 2003


Andrew MacIntyre wrote:

> So how much memory does your machine have?  I routinely start the
> interpreter and do:
> 
> >>> import md5
> >>> md5.new(open('/some/file','rb').read()).hexdigest()
> 
> on files of megabytes.  On an 83MB ZIP file I just tried, this
> completed
> in less than 10 seconds (Athlon 1.4, 512MB RAM, OS/2).
> 
> This approach won't work where the file is larger than available VM,
> but
> others have given you recipes to deal with that.

Unless the machine is never going to do anything else except check MD5s
for big files, deliberately using tremendous amounts of memory --
especially enough so that swap is needed -- just to make your Python
code a few lines shorter is rather wasteful.  It's perfectly
straightforward to process the file in large chunks and use the .update
method of the md5 object to keep a running tab.  With very large files,
doing them into sizeable (one mebibyte, say) is not going to run
detectably slower.

-- 
 Erik Max Francis / max at alcyone.com / http://www.alcyone.com/max/
 __ San Jose, CA, USA / 37 20 N 121 53 W / &tSftDotIotE
/  \ It's a man's world, and you men can have it.
\__/ Katherine Anne Porter
    The laws list / http://www.alcyone.com/max/physics/laws/
 Laws, rules, principles, effects, paradoxes, etc. in physics.




More information about the Python-list mailing list