[Python-ideas] incremental hashing in __hash__

Neil Girdhar mistersheik at gmail.com
Wed Jan 4 20:04:04 EST 2017


Couldn't you add __hash__ to collections.abc.Iterable ?  Essentially, 
expose __hash__ there; then all iterables automatically have a default hash 
that hashes their ordered contents.

On Wednesday, January 4, 2017 at 7:37:26 PM UTC-5, Steven D'Aprano wrote:
>
> On Wed, Jan 04, 2017 at 04:38:05PM -0500, j... at math.brown.edu 
> <javascript:> wrote: 
> > Instead of the proposals like "hash.from_iterable()", would it make 
> sense 
> > to allow tuple.__hash__() to accept any iterable, when called as a 
> > classmethod? 
>
> The public API for calculating the hash of something is to call the 
> hash() builtin function on some object, e.g. to call tuple.__hash__ you 
> write hash((a, b, c)). The __hash__ dunder method is implementation, not 
> interface, and normally shouldn't be called directly. 
>
> Unless I'm missing something obvious, your proposal would require the 
> caller to call the dunder methods directly: 
>
> class X: 
>     def __hash__(self): 
>         return tuple.__hash__(iter(self)) 
>
> I consider that a poor interface design. 
>
> But even if we decide to make an exception in this case, tuple.__hash__ 
> is currently an ordinary instance method right now. There's probably 
> code that relies on that fact and expects that: 
>
>     tuple.__hash__((a, b, c)) 
>
> is currently the same as 
>
>     (a, b, c).__hash__() 
>
>
> (Starting with the hash() builtin itself, I expect, although that is 
> easy enough to fix if needed.) Your proposal will break backwards 
> compatibility, as it requires a change in semantics: 
>
> (1) (a, b, c).__hash__() must keep the current behaviour, which 
> means behaving like a bound instance method; 
>
> (2) But tuple.__hash__ will no longer return an unbound method (actually 
> a function object, but the difference is unimportant) and instead will 
> return something that behaves like a bound class method. 
>
> Here's an implementation which does this: 
>
> http://code.activestate.com/recipes/577030-dualmethod-descriptor/ 
>
> so such a thing is possible. But it breaks backwards-compatability and 
> introduces something which I consider to be an unclean API (calling a 
> dunder method directly). Unless there's a *really* strong advantage to 
>
>     tuple.__hash__(...) 
>
> over 
>
>     hash.from_iterable(...) 
>
> (or equivalent), I would be against this change. 
>
>
>
> > (And similarly with frozenset.__hash__(), so that the fast C 
> > implementation of that algorithm could be used, rather than the slow 
> > collections.Set._hash() implementation. Then the duplicated 
> implementation 
> > in _collections_abc.py's Set._hash() could be removed completely, 
> > delegating to frozenset.__hash__() instead.) 
>
> This is a good point. Until now, I've been assuming that 
> hash.from_iterable should consider order. But frozenset shows us that 
> sometimes the hash should *not* consider order. 
>
> This hints that perhaps the hash.from_iterable() should have its own 
> optional dunder method. Or maybe we need two functions: an ordered 
> version and an unordered version. 
>
> Hmmm... just tossing out a wild idea here... let's get rid of the dunder 
> method part of your suggestion, and add new public class methods to 
> tuple and frozenset: 
>
>     tuple.hash_from_iter(iterable) 
>     frozenset.hash_from_iter(iterable) 
>
>
> That gets rid of all the objections about backwards compatibility, since 
> these are new methods. They're not dunder names, so there are no 
> objections to being used as part of the public API. 
>
> A possible objection is the question, is this functionality *actually* 
> important enough to bother? 
>
> Another possible objection: are these methods part of the sequence/set 
> API? If not, do they really belong on the tuple/frozenset? Maybe they 
> belong elsewhere? 
>
>
>
> > Would this API more cleanly communicate the algorithm being used and the 
> > implementation, 
>
> No. If you want to communicate the algorithm being used, write some 
> documentation. 
>
> Seriously, the public API doesn't communicate the algorithm used for the 
> implementation. How can it? We can keep the same interface and change 
> the implementation, or change the interface and keep the implementation. 
> The two are (mostly) independent. 
>
>
>
> > while making a smaller increase in API surface area 
> > compared to introducing a new function? 
>
> It's difficult to quantify "API surface area". On the one hand, we have 
> the addition of one or two new functions or methods. Contrast with: 
>
>     * introducing a new kind of method into the built-ins (one which 
>       behaves like a classmethod when called from the class, and like 
>       an instance method when called from an instance); 
>
>     * changing tuple.__hash__ from an ordinary method to one of the 
>       above special methods; 
>
>     * and likewise for frozenset.__hash__; 
>
>     * change __hash__ from "only used as implementation, not as 
>       interface" to "sometimes used as interface". 
>
>
> To me, adding one or two new methods/functions is the smaller, or at 
> least less disruptive, change. 
>
>
>
> -- 
> Steve 
> _______________________________________________ 
> Python-ideas mailing list 
> Python... at python.org <javascript:> 
> https://mail.python.org/mailman/listinfo/python-ideas 
> Code of Conduct: http://python.org/psf/codeofconduct/ 
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20170104/b1305649/attachment-0001.html>


More information about the Python-ideas mailing list