Does hashlib support a file mode?

Chris Rebert clp2 at rebertia.com
Wed Jul 6 02:44:28 EDT 2011


On Tue, Jul 5, 2011 at 10:54 PM, Phlip <phlip2005 at gmail.com> wrote:
> Pythonistas:
>
> Consider this hashing code:
>
>  import hashlib
>  file = open(path)
>  m = hashlib.md5()
>  m.update(file.read())
>  digest = m.hexdigest()
>  file.close()
>
> If the file were huge, the file.read() would allocate a big string and
> thrash memory. (Yes, in 2011 that's still a problem, because these
> files could be movies and whatnot.)
>
> So if I do the stream trick - read one byte, update one byte, in a
> loop, then I'm essentially dragging that movie thru 8 bits of a 64 bit
> CPU. So that's the same problem; it would still be slow.
>
> So now I try this:
>
>  sum = os.popen('sha256sum %r' % path).read()
>
> Those of you who like to lie awake at night thinking of new ways to
> flame abusers of 'eval()' may have a good vent, there.

Indeed (*eyelid twitch*). That one-liner is arguably better written as:
sum = subprocess.check_output(['sha256sum', path])

> Does hashlib have a file-ready mode, to hide the streaming inside some
> clever DMA operations?

Barring undocumented voodoo, no, it doesn't appear to. You could
always read from the file in suitably large chunks instead (rather
than byte-by-byte, which is indeed ridiculous); see
io.DEFAULT_BUFFER_SIZE and/or the os.stat() trick referenced therein
and/or the block_size attribute of hash objects.
http://docs.python.org/library/io.html#io.DEFAULT_BUFFER_SIZE
http://docs.python.org/library/hashlib.html#hashlib.hash.block_size

Cheers,
Chris
--
http://rebertia.com



More information about the Python-list mailing list