[Python-ideas] incremental hashing in __hash__

Neil Girdhar mistersheik at gmail.com
Thu Jan 5 08:28:30 EST 2017


The point is that the OP doesn't want to write his own hash function, but
wants Python to provide a standard way of hashing an iterable.  Today, the
standard way is to convert to tuple and call hash on that.  That may not be
efficient. FWIW from a style perspective, I agree with OP.

On Thu, Jan 5, 2017 at 4:30 AM M.-A. Lemburg <mal at egenix.com> wrote:

> On 28.12.2016 04:13, jab at math.brown.edu wrote:
> > Suppose you have implemented an immutable Position type to represent
> > the state of a game played on an MxN board, where the board size can
> > grow quite large.
> > ...
> >
> > According to
> https://docs.python.org/3/reference/datamodel.html#object.__hash__
> > :
> >
> >
> > """
> > it is advised to mix together the hash values of the components of the
> > object that also play a part in comparison of objects by packing them
> > into a tuple and hashing the tuple. Example:
> >
> > def __hash__(self):
> >     return hash((self.name, self.nick, self.color))
> >
> > """
> >
> >
> > Applying this advice to the use cases above would require creating an
> > arbitrarily large tuple in memory before passing it to hash(), which
> > is then just thrown away. It would be preferable if there were a way
> > to pass multiple values to hash() in a streaming fashion, such that
> > the overall hash were computed incrementally, without building up a
> > large object in memory first.
>
> I think there's a misunderstanding here: the hash(obj) built-in
> merely interfaces to the obj.__hash__() method (or the tp_hash slot
> for C types) and returns whatever these methods give.
>
> It doesn't implement any logic by itself.
>
> If you would like to implement a more efficient hash algorithm
> for your types, just go ahead and write them as .__hash__()
> method or tp_hash slot method and you're done.
>
> The example from the docs is just to showcase an example of
> how such a hash function should work, i.e. to mix in all
> relevant data attributes.
>
> In your case, you'd probably use a simple for loop to calculate
> the hash without creating tuples or any other temporary
> structures.
>
> Here's the hash implementation tuples use as an example
>
> /* The addend 82520, was selected from the range(0, 1000000) for
>    generating the greatest number of prime multipliers for tuples
>    upto length eight:
>
>      1082527, 1165049, 1082531, 1165057, 1247581, 1330103, 1082533,
>      1330111, 1412633, 1165069, 1247599, 1495177, 1577699
>
>    Tests have shown that it's not worth to cache the hash value, see
>    issue #9685.
> */
>
> static Py_hash_t
> tuplehash(PyTupleObject *v)
> {
>     Py_uhash_t x;  /* Unsigned for defined overflow behavior. */
>     Py_hash_t y;
>     Py_ssize_t len = Py_SIZE(v);
>     PyObject **p;
>     Py_uhash_t mult = _PyHASH_MULTIPLIER;
>     x = 0x345678UL;
>     p = v->ob_item;
>     while (--len >= 0) {
>         y = PyObject_Hash(*p++);
>         if (y == -1)
>             return -1;
>         x = (x ^ y) * mult;
>         /* the cast might truncate len; that doesn't change hash
> stability */
>         mult += (Py_hash_t)(82520UL + len + len);
>     }
>     x += 97531UL;
>     if (x == (Py_uhash_t)-1)
>         x = -2;
>     return x;
> }
>
> As you can see, there's some magic going on there to make
> sure that the hash values behave well when used as "keys"
> for the dictionary implementation (which is their main
> purpose in Python).
>
> You are free to create your own hash implementation.
> The only characteristic to pay attention to is to have
> objects which compare equal give the same hash value.
> This is needed to be able to map such objects to the same
> dictionary slots.
>
> There should be no need to have a special hash function which
> works on iterables. As long as those iterable objects define
> their own .__hash__() method or tp_slot, the hash() built-in
> (and Python's dict implementation) will use these and, if needed,
> those methods can then use an approach to build hash values
> using iterators on the object's internal data along similar
> lines as the above tuple implementation.
>
> --
> Marc-Andre Lemburg
> eGenix.com
>
> Professional Python Services directly from the Experts (#1, Jan 05 2017)
> >>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
> >>> Python Database Interfaces ...           http://products.egenix.com/
> >>> Plone/Zope Database Interfaces ...           http://zope.egenix.com/
> ________________________________________________________________________
>
> ::: We implement business ideas - efficiently in both time and costs :::
>
>    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>            Registered at Amtsgericht Duesseldorf: HRB 46611
>                http://www.egenix.com/company/contact/
>                       http://www.malemburg.com/
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
> --
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/python-ideas/XcuC01a8SYs/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscribe at googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20170105/5495df1d/attachment.html>


More information about the Python-ideas mailing list