[Python-ideas] incremental hashing in __hash__
Matt Gilson
matt at getpattern.com
Thu Jan 5 12:32:12 EST 2017
I agree with Paul -- I'm not convinced that this is common enough or that
the benefits are big enough to warrant something builtin. However, I did
decide to dust off some of my old skills and I threw together a simple gist
to see how hard it would be to create something using Cython based on the
CPython tuple hash algorithm. I don't know how well it works for arbitrary
iterables without a `__length_hint__`, but seems to work as intended for
iterables that have the length hint.
<goog_827102756>
https://gist.github.com/mgilson/129859a79487a483163980db25b709bf
If you're interested, or want to pick this up and actually do something
with it, feel free... Also, I haven't written anything using Cython for
ages, so if this could be further optimized, feel free to let me know.
On Thu, Jan 5, 2017 at 7:58 AM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 5 January 2017 at 13:28, Neil Girdhar <mistersheik at gmail.com> wrote:
> > The point is that the OP doesn't want to write his own hash function, but
> > wants Python to provide a standard way of hashing an iterable. Today,
> the
> > standard way is to convert to tuple and call hash on that. That may not
> be
> > efficient. FWIW from a style perspective, I agree with OP.
>
> The debate here regarding tuple/frozenset indicates that there may not
> be a "standard way" of hashing an iterable (should order matter?).
> Although I agree that assuming order matters is a reasonable
> assumption to make in the absence of any better information.
>
> Hashing is low enough level that providing helpers in the stdlib is
> not unreasonable. It's not obvious (to me, at least) that it's a
> common enough need to warrant it, though. Do we have any information
> on how often people implement their own __hash__, or how often
> hash(tuple(my_iterable)) would be an acceptable hash, except for the
> cost of creating the tuple? The OP's request is the only time this has
> come up as a requirement, to my knowledge. Hence my suggestion to copy
> the tuple implementation, modify it to work with general iterables,
> and publish it as a 3rd party module - its usage might give us an idea
> of how often this need arises. (The other option would be for someone
> to do some analysis of published code).
>
> Assuming it is a sufficiently useful primitive to add, then we can
> debate naming. But I'd prefer it to be named in such a way that it
> makes it clear that it's a low-level helper for people writing their
> own __hash__ function, and not some sort of variant of hashing (which
> hash.from_iterable implies to me).
>
> Paul
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
--
[image: pattern-sig.png]
Matt Gilson // SOFTWARE ENGINEER
E: matt at getpattern.com // P: 603.892.7736
We’re looking for beta testers. Go here
<https://www.getpattern.com/meetpattern> to sign up!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20170105/36c2782d/attachment.html>
More information about the Python-ideas
mailing list