Does hashlib support a file mode?

Adam Tauno Williams awilliam at whitemice.org
Wed Jul 6 06:55:31 EDT 2011


On Tue, 2011-07-05 at 22:54 -0700, Phlip wrote:
> Pythonistas
> Consider this hashing code:
>   import hashlib
>   file = open(path)
>   m = hashlib.md5()
>   m.update(file.read())
>   digest = m.hexdigest()
>   file.close()
> If the file were huge, the file.read() would allocate a big string and
> thrash memory. (Yes, in 2011 that's still a problem, because these
> files could be movies and whatnot.)

Yes, the simple rule is do not *ever* file.read().  No matter what the
year this will never be OK.  Always chunk reading a file into reasonable
I/O blocks.

For example I use this function to copy a stream and return a SHA512 and
the output streams size:

    def write(self, in_handle, out_handle):
        m = hashlib.sha512()
        data = in_handle.read(4096)
        while True:
            if not data:
                break
            m.update(data)
            out_handle.write(data)
            data = in_handle.read(4096)
        out_handle.flush()
        return (m.hexdigest(), in_handle.tell())

> Does hashlib have a file-ready mode, to hide the streaming inside some
> clever DMA operations?

Chunk it to something close to the block size of your underlying
filesystem.




More information about the Python-list mailing list