[Python-ideas] incremental hashing in __hash__

Ned Batchelder ned at nedbatchelder.com
Wed Dec 28 11:48:10 EST 2016


On 12/27/16 10:13 PM, jab at math.brown.edu wrote:
> Applying this advice to the use cases above would require creating an
> arbitrarily large tuple in memory before passing it to hash(), which
> is then just thrown away. It would be preferable if there were a way
> to pass multiple values to hash() in a streaming fashion, such that
> the overall hash were computed incrementally, without building up a
> large object in memory first.
>
> Should there be better support for this use case? Perhaps hash() could
> support an alternative signature, allowing it to accept a stream of
> values whose combined hash would be computed incrementally in
> *constant* space and linear time, e.g. "hash(items=iter(self))".
You can write a simple function to use hash iteratively to hash the
entire stream in constant space and linear time:

    def hash_stream(them):
        val = 0
        for it in them:
            val = hash((val, it))
        return val

Although this creates N 2-tuples, they come and go, so the memory use
won't grow.  Adjust the code as needed to achieve canonicalization
before iterating.

Or maybe I am misunderstanding the requirements?

--Ned.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20161228/f0982e0d/attachment.html>


More information about the Python-ideas mailing list