<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Thu, Jan 5, 2017 at 4:00 AM Matt Gilson <<a href="mailto:matt@getpattern.com">matt@getpattern.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg">But, I think that the problem with adding `__hash__` to collections.abc.Iterable is that not all iterables are immutable -- And if they aren't immutable, then allowing them to be hashed is likely to be a pretty bad idea...</div></blockquote><div><br></div><div>Good point. A better option is to add collections.abc.ImmutableIterable that derives from Iterable and provides __hash__. Since tuple inherits from it, it can choose to delegate up. Then I think everyone is happy. </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg"><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">I'm still having a hard time being convinced that this is very much of an optimization at all ...</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">If you start hashing tuples that are large enough that memory is a concern, then that's going to also take a <i class="gmail_msg">really</i> long time and probably be prohibitive anyway. Just for kicks, I decided to throw together a simple script to time how much penalty you pay for hashing a tuple:</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg"><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg">class F(object):</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"> def __init__(self, arg):</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"> self.arg = arg</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"><br class="gmail_msg"></font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"> def __hash__(self):</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"> return hash(tuple(self.arg))</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"><br class="gmail_msg"></font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"><br class="gmail_msg"></font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg">class T(object):</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"> def __init__(self, arg):</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"> self.arg = tuple(arg)</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"><br class="gmail_msg"></font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"> def __hash__(self):</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"> return hash(self.arg)</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"><br class="gmail_msg"></font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"><br class="gmail_msg"></font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg">class C(object):</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"> def __init__(self, arg):</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"> self.arg = tuple(arg)</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"> self._hash = None</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"><br class="gmail_msg"></font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"> def __hash__(self):</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"> if self._hash is None:</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"> self._hash = hash(tuple(self.arg))</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"> return self._hash</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"><br class="gmail_msg"></font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg">import timeit</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"><br class="gmail_msg"></font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg">print(timeit.timeit('hash(f)', 'from __main__ import F; f = F(list(range(500)))'))</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg">print(timeit.timeit('hash(t)', 'from __main__ import T; t = T(list(range(500)))'))</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg">print(timeit.timeit('hash(c)', 'from __main__ import C; c = C(list(range(500)))'))</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"><br class="gmail_msg"></font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg">results = []</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg">for i in range(1, 11):</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"> n = i * 100</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"> t1 = timeit.timeit('hash(f)', 'from __main__ import F; f = F(list(range(%d)))' % i)</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"> t2 = timeit.timeit('hash(t)', 'from __main__ import T; t = T(list(range(%d)))' % i)</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"> results.append(t1/t2)</font></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg">print(results)</font></div></div><div class="gmail_msg"><font face="monospace, monospace" class="gmail_msg"><br class="gmail_msg"></font></div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">F is going to create a new tuple each time and then hash it. T already has a tuple, so we'll only pay the cost of hashing a tuple, not the cost of constructing a tuple and C caches the hash value and re-uses it once it is known. C is the winner by a factor of 10 or more (no surprise there). But the real interesting thing is that the the ratio of the timing results from hashing `F` vs. `T` is relatively constant in the range of my test (up to 1000 elements) and that ratio's value is approximately 1.3. For most applications, that seems reasonable. If you really need a speed-up, then I suppose you could recode the thing in Cython and see what happens, but I doubt that will be frequently necessary. If you _do_ code it up in Cython, put it up on Pypi and see if people use it...</div><div class="gmail_msg"><br class="gmail_msg"></div></div><div class="gmail_extra gmail_msg"><br class="gmail_msg"><div class="gmail_quote gmail_msg"></div></div><div class="gmail_extra gmail_msg"><div class="gmail_quote gmail_msg">On Wed, Jan 4, 2017 at 5:04 PM, Neil Girdhar <span dir="ltr" class="gmail_msg"><<a href="mailto:mistersheik@gmail.com" class="gmail_msg" target="_blank">mistersheik@gmail.com</a>></span> wrote:<br class="gmail_msg"></div></div><div class="gmail_extra gmail_msg"><div class="gmail_quote gmail_msg"><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg">Couldn't you add __hash__ to collections.abc.Iterable ? Essentially, expose __hash__ there; then all iterables automatically have a default hash that hashes their ordered contents.<br class="gmail_msg"><br class="gmail_msg">On Wednesday, January 4, 2017 at 7:37:26 PM UTC-5, Steven D'Aprano wrote:<blockquote class="gmail_quote gmail_msg" style="margin:0;margin-left:0.8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="gmail_msg"><div class="m_5443167817305544922h5 gmail_msg">On Wed, Jan 04, 2017 at 04:38:05PM -0500, <a rel="nofollow" class="gmail_msg">j...@math.brown.edu</a> wrote:
<br class="gmail_msg">> Instead of the proposals like "hash.from_iterable()", would it make sense
<br class="gmail_msg">> to allow tuple.__hash__() to accept any iterable, when called as a
<br class="gmail_msg">> classmethod?
<br class="gmail_msg">
<br class="gmail_msg">The public API for calculating the hash of something is to call the
<br class="gmail_msg">hash() builtin function on some object, e.g. to call tuple.__hash__ you
<br class="gmail_msg">write hash((a, b, c)). The __hash__ dunder method is implementation, not
<br class="gmail_msg">interface, and normally shouldn't be called directly.
<br class="gmail_msg">
<br class="gmail_msg">Unless I'm missing something obvious, your proposal would require the
<br class="gmail_msg">caller to call the dunder methods directly:
<br class="gmail_msg">
<br class="gmail_msg">class X:
<br class="gmail_msg"> def __hash__(self):
<br class="gmail_msg"> return tuple.__hash__(iter(self))
<br class="gmail_msg">
<br class="gmail_msg">I consider that a poor interface design.
<br class="gmail_msg">
<br class="gmail_msg">But even if we decide to make an exception in this case, tuple.__hash__
<br class="gmail_msg">is currently an ordinary instance method right now. There's probably
<br class="gmail_msg">code that relies on that fact and expects that:
<br class="gmail_msg">
<br class="gmail_msg"> tuple.__hash__((a, b, c))
<br class="gmail_msg">
<br class="gmail_msg">is currently the same as
<br class="gmail_msg">
<br class="gmail_msg"> (a, b, c).__hash__()
<br class="gmail_msg">
<br class="gmail_msg">
<br class="gmail_msg">(Starting with the hash() builtin itself, I expect, although that is
<br class="gmail_msg">easy enough to fix if needed.) Your proposal will break backwards
<br class="gmail_msg">compatibility, as it requires a change in semantics:
<br class="gmail_msg">
<br class="gmail_msg">(1) (a, b, c).__hash__() must keep the current behaviour, which
<br class="gmail_msg">means behaving like a bound instance method;
<br class="gmail_msg">
<br class="gmail_msg">(2) But tuple.__hash__ will no longer return an unbound method (actually
<br class="gmail_msg">a function object, but the difference is unimportant) and instead will
<br class="gmail_msg">return something that behaves like a bound class method.
<br class="gmail_msg">
<br class="gmail_msg">Here's an implementation which does this:
<br class="gmail_msg">
<br class="gmail_msg"><a href="http://code.activestate.com/recipes/577030-dualmethod-descriptor/" rel="nofollow" class="gmail_msg" target="_blank">http://code.activestate.com/recipes/577030-dualmethod-descriptor/</a>
<br class="gmail_msg">
<br class="gmail_msg">so such a thing is possible. But it breaks backwards-compatability and
<br class="gmail_msg">introduces something which I consider to be an unclean API (calling a
<br class="gmail_msg">dunder method directly). Unless there's a *really* strong advantage to
<br class="gmail_msg">
<br class="gmail_msg"> tuple.__hash__(...)
<br class="gmail_msg">
<br class="gmail_msg">over
<br class="gmail_msg">
<br class="gmail_msg"> hash.from_iterable(...)
<br class="gmail_msg">
<br class="gmail_msg">(or equivalent), I would be against this change.
<br class="gmail_msg">
<br class="gmail_msg">
<br class="gmail_msg">
<br class="gmail_msg">> (And similarly with frozenset.__hash__(), so that the fast C
<br class="gmail_msg">> implementation of that algorithm could be used, rather than the slow
<br class="gmail_msg">> collections.Set._hash() implementation. Then the duplicated implementation
<br class="gmail_msg">> in _collections_abc.py's Set._hash() could be removed completely,
<br class="gmail_msg">> delegating to frozenset.__hash__() instead.)
<br class="gmail_msg">
<br class="gmail_msg">This is a good point. Until now, I've been assuming that
<br class="gmail_msg">hash.from_iterable should consider order. But frozenset shows us that
<br class="gmail_msg">sometimes the hash should *not* consider order.
<br class="gmail_msg">
<br class="gmail_msg">This hints that perhaps the hash.from_iterable() should have its own
<br class="gmail_msg">optional dunder method. Or maybe we need two functions: an ordered
<br class="gmail_msg">version and an unordered version.
<br class="gmail_msg">
<br class="gmail_msg">Hmmm... just tossing out a wild idea here... let's get rid of the dunder
<br class="gmail_msg">method part of your suggestion, and add new public class methods to
<br class="gmail_msg">tuple and frozenset:
<br class="gmail_msg">
<br class="gmail_msg"> tuple.hash_from_iter(iterable)
<br class="gmail_msg"> frozenset.hash_from_iter(iterable)
<br class="gmail_msg">
<br class="gmail_msg">
<br class="gmail_msg">That gets rid of all the objections about backwards compatibility, since
<br class="gmail_msg">these are new methods. They're not dunder names, so there are no
<br class="gmail_msg">objections to being used as part of the public API.
<br class="gmail_msg">
<br class="gmail_msg">A possible objection is the question, is this functionality *actually*
<br class="gmail_msg">important enough to bother?
<br class="gmail_msg">
<br class="gmail_msg">Another possible objection: are these methods part of the sequence/set
<br class="gmail_msg">API? If not, do they really belong on the tuple/frozenset? Maybe they
<br class="gmail_msg">belong elsewhere?
<br class="gmail_msg">
<br class="gmail_msg">
<br class="gmail_msg">
<br class="gmail_msg">> Would this API more cleanly communicate the algorithm being used and the
<br class="gmail_msg">> implementation,
<br class="gmail_msg">
<br class="gmail_msg">No. If you want to communicate the algorithm being used, write some
<br class="gmail_msg">documentation.
<br class="gmail_msg">
<br class="gmail_msg">Seriously, the public API doesn't communicate the algorithm used for the
<br class="gmail_msg">implementation. How can it? We can keep the same interface and change
<br class="gmail_msg">the implementation, or change the interface and keep the implementation.
<br class="gmail_msg">The two are (mostly) independent.
<br class="gmail_msg">
<br class="gmail_msg">
<br class="gmail_msg">
<br class="gmail_msg">> while making a smaller increase in API surface area
<br class="gmail_msg">> compared to introducing a new function?
<br class="gmail_msg">
<br class="gmail_msg">It's difficult to quantify "API surface area". On the one hand, we have
<br class="gmail_msg">the addition of one or two new functions or methods. Contrast with:
<br class="gmail_msg">
<br class="gmail_msg"> * introducing a new kind of method into the built-ins (one which
<br class="gmail_msg"> behaves like a classmethod when called from the class, and like
<br class="gmail_msg"> an instance method when called from an instance);
<br class="gmail_msg">
<br class="gmail_msg"> * changing tuple.__hash__ from an ordinary method to one of the
<br class="gmail_msg"> above special methods;
<br class="gmail_msg">
<br class="gmail_msg"> * and likewise for frozenset.__hash__;
<br class="gmail_msg">
<br class="gmail_msg"> * change __hash__ from "only used as implementation, not as
<br class="gmail_msg"> interface" to "sometimes used as interface".
<br class="gmail_msg">
<br class="gmail_msg">
<br class="gmail_msg">To me, adding one or two new methods/functions is the smaller, or at
<br class="gmail_msg">least less disruptive, change.
<br class="gmail_msg">
<br class="gmail_msg">
<br class="gmail_msg">
<br class="gmail_msg">--
<br class="gmail_msg">Steve
<br class="gmail_msg">_______________________________________________
<br class="gmail_msg">Python-ideas mailing list
<br class="gmail_msg"></div></div><a rel="nofollow" class="gmail_msg">Python...@python.org</a>
<br class="gmail_msg"><span class="gmail_msg"><a href="https://mail.python.org/mailman/listinfo/python-ideas" rel="nofollow" class="gmail_msg" target="_blank">https://mail.python.org/mailman/listinfo/python-ideas</a>
<br class="gmail_msg">Code of Conduct: <a href="http://python.org/psf/codeofconduct/" rel="nofollow" class="gmail_msg" target="_blank">http://python.org/psf/codeofconduct/</a>
<br class="gmail_msg"></span></blockquote></div><br class="gmail_msg"></blockquote></div></div><div class="gmail_extra gmail_msg"><div class="gmail_quote gmail_msg"><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">_______________________________________________<br class="gmail_msg">
Python-ideas mailing list<br class="gmail_msg">
<a href="mailto:Python-ideas@python.org" class="gmail_msg" target="_blank">Python-ideas@python.org</a><br class="gmail_msg">
<a href="https://mail.python.org/mailman/listinfo/python-ideas" rel="noreferrer" class="gmail_msg" target="_blank">https://mail.python.org/mailman/listinfo/python-ideas</a><br class="gmail_msg">
Code of Conduct: <a href="http://python.org/psf/codeofconduct/" rel="noreferrer" class="gmail_msg" target="_blank">http://python.org/psf/codeofconduct/</a><br class="gmail_msg"></blockquote></div></div><div class="gmail_extra gmail_msg"><br class="gmail_msg"><br clear="all" class="gmail_msg"><div class="gmail_msg"><br class="gmail_msg"></div>-- <br class="gmail_msg"><div class="m_5443167817305544922gmail_signature gmail_msg" data-smartmail="gmail_signature"><div dir="ltr" class="gmail_msg"><span class="gmail_msg"><br class="gmail_msg"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt" class="gmail_msg"><span style="font-size:14.6667px;font-family:Arial;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent" class="gmail_msg"><img src="https://lh6.googleusercontent.com/KjKhxPuGD6aSETVh9dazBSiS5FGrt8b37DYQ6QZAxpJIomTEygFF6ygbgOUbR0MBzycO8-LF-FspB3wTJb8LXEfIbJ6lN4H2J04EYzOZg7uaG7YcGkWesdldHuXzhLwS-etJBGO6" width="95" height="26" style="border:none" alt="pattern-sig.png" class="gmail_msg"></span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt" class="gmail_msg"><span style="font-size:10.6667px;font-family:Arial;color:rgb(102,102,102);font-weight:700;vertical-align:baseline;white-space:pre-wrap" class="gmail_msg">Matt Gilson</span><span style="font-size:10.6667px;font-family:Arial;color:rgb(102,102,102);vertical-align:baseline;white-space:pre-wrap" class="gmail_msg"> // SOFTWARE ENGINEER</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt" class="gmail_msg"><span style="font-size:10.6667px;font-family:Arial;color:rgb(153,153,153);vertical-align:baseline;white-space:pre-wrap" class="gmail_msg">E: </span><span style="font-size:10.6667px;font-family:Arial;color:rgb(17,85,204);vertical-align:baseline;white-space:pre-wrap" class="gmail_msg"><a href="mailto:matt@getpattern.com" class="gmail_msg" target="_blank">matt@getpattern.com</a></span><span style="font-size:10.6667px;font-family:Arial;color:rgb(153,153,153);vertical-align:baseline;white-space:pre-wrap" class="gmail_msg"> // P: <a href="tel:(603)%20892-7736" value="+16038927736" class="gmail_msg" target="_blank">603.892.7736</a></span></p><br class="gmail_msg"><span style="font-size:12px;font-family:Arial;color:rgb(255,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent" class="gmail_msg">We’re looking for beta testers. Go </span><a href="https://www.getpattern.com/meetpattern" style="text-decoration:none" class="gmail_msg" target="_blank"><span style="font-size:12px;font-family:Arial;color:rgb(17,85,204);text-decoration:underline;vertical-align:baseline;white-space:pre-wrap;background-color:transparent" class="gmail_msg">here</span></a><span style="font-size:12px;font-family:Arial;color:rgb(255,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent" class="gmail_msg"> to sign up!</span></span><br class="gmail_msg"></div></div>
</div></blockquote></div></div>