On Sun, Oct 2, 2011 at 8:45 PM, Terry Reedy
On 10/2/2011 1:28 PM, Guido van Rossum wrote:
The problem here seems to be that collections/abc.py defines Iterable to have __iter__ but not __contains__, but the Python language defines the 'in' operator as trying __contains__ first, and if that is not defined, using __iter__.
This is not surprising given Python's history, but it does cause some confusion when one compares the ABCs with the actual behavior. I also think that the way the ABCs have it makes more sense -- for single-use iterables (like files) the default behavior of "in" exhausts the iterator which is costly and fairly useless.
Now, should we change "in" to only look for __contains__ and not fall back on __iter__? If we were debating Python 3's feature set I would probably agree with that, as a clean break with the past and a clear future. Since we're debating Python 3.3, however, I think we should just lay it to rest and use the fallback solution proposed: define __contains__ on files to raise TypeError
That would break legitimate code that uses 'in file'. The following works as stated:
if 'START\n' in f: for line in f: <process lines after the START line> else: <there are none>
There would have to be a deprecation process. But see below.
Hm. That code sample looks rather artificial. (Though now that I have seen it I can't help thinking that it might fit the bill for somebody... :-)
and leave the rest alone. Maybe make a note for Python 4. Maybe add a recommendation to PEP 8 to always implement __contains__ if you implement __iter__.
[Did you mean __next__?]
No, I really meant __iter__. Because in Python 4 I would be okay with not using a loop as a fallback if __contains__ doesn't exist. So if in Py3 "x in a" works by using __iter__, you would have to keep it working in Py4 by defining __contains__. And no, I don't expect Py4 within this decade...
It seems to me better that functions that need a re-iterable non-iterator input should check for the absence of .__next__ to exclude *all* iterables, including file objects. There is no need to complicate out nice, simple, minimal iterator protocol.
if hasattr(reiterable, '__next__'): raise TypeError("non-iterator required')
That's a different issue -- you're talking about preventing bad use of __iter__ in the calling class. I was talking about supporting "in" by the defining class.
But let's not break existing code that depends on the current behavior -- we have better things to do than to break perfectly fine working code in a fit of pedantry.
Still, most people in this thread seem to agree that "x in file" works by accident, not by design, and is more likely to do harm than good, and many have in fact proposed various more serious ways of making it not work in (I presume) Py3.3. -- --Guido van Rossum (python.org/~guido)