[Python-ideas] incremental hashing in __hash__
Ned Batchelder
ned at nedbatchelder.com
Wed Dec 28 11:48:10 EST 2016
On 12/27/16 10:13 PM, jab at math.brown.edu wrote:
> Applying this advice to the use cases above would require creating an
> arbitrarily large tuple in memory before passing it to hash(), which
> is then just thrown away. It would be preferable if there were a way
> to pass multiple values to hash() in a streaming fashion, such that
> the overall hash were computed incrementally, without building up a
> large object in memory first.
>
> Should there be better support for this use case? Perhaps hash() could
> support an alternative signature, allowing it to accept a stream of
> values whose combined hash would be computed incrementally in
> *constant* space and linear time, e.g. "hash(items=iter(self))".
You can write a simple function to use hash iteratively to hash the
entire stream in constant space and linear time:
def hash_stream(them):
val = 0
for it in them:
val = hash((val, it))
return val
Although this creates N 2-tuples, they come and go, so the memory use
won't grow. Adjust the code as needed to achieve canonicalization
before iterating.
Or maybe I am misunderstanding the requirements?
--Ned.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20161228/f0982e0d/attachment.html>
More information about the Python-ideas
mailing list