Does hashlib support a file mode?
Thomas Rachel
nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915 at spamschutz.glglgl.de
Wed Jul 6 02:37:47 EDT 2011
Am 06.07.2011 07:54 schrieb Phlip:
> Pythonistas:
>
> Consider this hashing code:
>
> import hashlib
> file = open(path)
> m = hashlib.md5()
> m.update(file.read())
> digest = m.hexdigest()
> file.close()
>
> If the file were huge, the file.read() would allocate a big string and
> thrash memory. (Yes, in 2011 that's still a problem, because these
> files could be movies and whatnot.)
>
> So if I do the stream trick - read one byte, update one byte, in a
> loop, then I'm essentially dragging that movie thru 8 bits of a 64 bit
> CPU. So that's the same problem; it would still be slow.
Yes. That is why you should read with a reasonable block size. Not too
small and not too big.
def filechunks(f, size=8192):
while True:
s = f.read(size)
if not s: break
yield s
# f.close() # maybe...
import hashlib
file = open(path)
m = hashlib.md5()
fc = filechunks(file)
for chunk in fc:
m.update(chunk)
digest = m.hexdigest()
file.close()
So you are reading in 8 kiB chunks. Feel free to modify this - maybe use
os.stat(file).st_blksize instead (which is AFAIK the recommended
minimum), or a value of about 1 MiB...
> So now I try this:
>
> sum = os.popen('sha256sum %r' % path).read()
This is not as nice as the above, especially not with a path containing
strange characters. What about, at least,
def shellquote(*strs):
return " ".join([
"'"+st.replace("'","'\\''")+"'"
for st in strs
])
sum = os.popen('sha256sum %r' % shellquote(path)).read()
or, even better,
import subprocess
sp = subprocess.Popen(['sha256sum', path'],
stdin=subprocess.PIPE, stdout=subprocess.PIPE)
sp.stdin.close() # generate EOF
sum = sp.stdout.read()
sp.wait()
?
> Does hashlib have a file-ready mode, to hide the streaming inside some
> clever DMA operations?
AFAIK not.
Thomas
More information about the Python-list
mailing list