Does hashlib support a file mode?
Adam Tauno Williams
awilliam at whitemice.org
Wed Jul 6 06:55:31 EDT 2011
On Tue, 2011-07-05 at 22:54 -0700, Phlip wrote:
> Pythonistas
> Consider this hashing code:
> import hashlib
> file = open(path)
> m = hashlib.md5()
> m.update(file.read())
> digest = m.hexdigest()
> file.close()
> If the file were huge, the file.read() would allocate a big string and
> thrash memory. (Yes, in 2011 that's still a problem, because these
> files could be movies and whatnot.)
Yes, the simple rule is do not *ever* file.read(). No matter what the
year this will never be OK. Always chunk reading a file into reasonable
I/O blocks.
For example I use this function to copy a stream and return a SHA512 and
the output streams size:
def write(self, in_handle, out_handle):
m = hashlib.sha512()
data = in_handle.read(4096)
while True:
if not data:
break
m.update(data)
out_handle.write(data)
data = in_handle.read(4096)
out_handle.flush()
return (m.hexdigest(), in_handle.tell())
> Does hashlib have a file-ready mode, to hide the streaming inside some
> clever DMA operations?
Chunk it to something close to the block size of your underlying
filesystem.
More information about the Python-list
mailing list