Does hashlib support a file mode?
clp2 at rebertia.com
Wed Jul 6 08:44:28 CEST 2011
On Tue, Jul 5, 2011 at 10:54 PM, Phlip <phlip2005 at gmail.com> wrote:
> Consider this hashing code:
> import hashlib
> file = open(path)
> m = hashlib.md5()
> digest = m.hexdigest()
> If the file were huge, the file.read() would allocate a big string and
> thrash memory. (Yes, in 2011 that's still a problem, because these
> files could be movies and whatnot.)
> So if I do the stream trick - read one byte, update one byte, in a
> loop, then I'm essentially dragging that movie thru 8 bits of a 64 bit
> CPU. So that's the same problem; it would still be slow.
> So now I try this:
> sum = os.popen('sha256sum %r' % path).read()
> Those of you who like to lie awake at night thinking of new ways to
> flame abusers of 'eval()' may have a good vent, there.
Indeed (*eyelid twitch*). That one-liner is arguably better written as:
sum = subprocess.check_output(['sha256sum', path])
> Does hashlib have a file-ready mode, to hide the streaming inside some
> clever DMA operations?
Barring undocumented voodoo, no, it doesn't appear to. You could
always read from the file in suitably large chunks instead (rather
than byte-by-byte, which is indeed ridiculous); see
io.DEFAULT_BUFFER_SIZE and/or the os.stat() trick referenced therein
and/or the block_size attribute of hash objects.
More information about the Python-list