[Python-ideas] incremental hashing in __hash__

jab at math.brown.edu jab at math.brown.edu
Wed Dec 28 12:27:59 EST 2016


On Wed, Dec 28, 2016 at 11:48 AM, Ned Batchelder <ned at nedbatchelder.com>
wrote:

> You can write a simple function to use hash iteratively to hash the entire
> stream in constant space and linear time:
>
>     def hash_stream(them):
>         val = 0
>         for it in them:
>             val = hash((val, it))
>         return val
>
> Although this creates N 2-tuples, they come and go, so the memory use
> won't grow.  Adjust the code as needed to achieve canonicalization before
> iterating.
>
> Or maybe I am misunderstanding the requirements?
>

This is better than solutions like http://stackoverflow.com/a/
27952689/161642 in the sense that it's a little higher level (no bit
shifting or magic numbers).

But it's not clear that it's any better in the sense that you're still
rolling your own incremental hash algorithm out of a lower-level primitive
that doesn't document support for this, and therefore taking responsibility
yourself for how well it distributes values into buckets.

Are you confident this results in good hash performance? Is this better
than a solution built on top of a hash function with an explicit API for
calculating a hash incrementally, such as the crc32 example I included?
(And again, this would ideally be a sys.hash_info.hash_bits -bit algorithm.)

Don't we still probably want either:

1) Python to provide some such hash_stream() function as a built-in,

or failing that,

2) the https://docs.python.org/3/reference/datamodel.html#object.__hash__
documentation to bless this as the recommended solution to this problem,
thereby providing assurance of its performance?

If that makes sense, I'd be happy to file an issue, and include the start
of a patch providing either 1 or 2.

Thanks very much for the helpful response.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20161228/a6a42266/attachment.html>


More information about the Python-ideas mailing list