Should the definition of an "(async) iterator" include __iter__?
Over in https://github.com/python/typeshed/issues/6030 I have managed to kick up a discussion over what exactly an "iterator" is. If you look at https://docs.python.org/3/library/functions.html#iter you will see the docs say it "Return[s] an iterator <https://docs.python.org/3/glossary.html#term-iterator> object." Great, but you go the glossary definition of "iterator" at https://docs.python.org/3/glossary.html#term-iterator you will see it says "[i]terators are required to have an __iter__() <https://docs.python.org/3/reference/datamodel.html#object.__iter__> method" which neither `for` nor `iter()` actually enforce. Is there something to do here? Do we loosen the definition of "iterator" to say they *should* define __iter__? Leave it as-is with an understanding that we know that it's technically inaccurate for iter() but that we want to encourage people to define __iter__? I'm assuming people don't want to change `for` and `iter()` to start requiring __iter__ be defined if we decided to go down the "remove the __aiter__ requirement" from aiter() last week. BTW all of this applies to async iterators as well.
My view of this is: A. It's not an iterator if it doesn't define `__next__`. B. It is strongly recommended that iterators also define `__iter__`. In "standards" language, I think (A) is MUST and (B) is merely OUGHT or maybe SHOULD. On Tue, Sep 14, 2021 at 12:30 PM Brett Cannon <brett@python.org> wrote:
Over in https://github.com/python/typeshed/issues/6030 I have managed to kick up a discussion over what exactly an "iterator" is. If you look at https://docs.python.org/3/library/functions.html#iter you will see the docs say it "Return[s] an iterator <https://docs.python.org/3/glossary.html#term-iterator> object." Great, but you go the glossary definition of "iterator" at https://docs.python.org/3/glossary.html#term-iterator you will see it says "[i]terators are required to have an __iter__() <https://docs.python.org/3/reference/datamodel.html#object.__iter__> method" which neither `for` nor `iter()` actually enforce.
Is there something to do here? Do we loosen the definition of "iterator" to say they *should* define __iter__? Leave it as-is with an understanding that we know that it's technically inaccurate for iter() but that we want to encourage people to define __iter__? I'm assuming people don't want to change `for` and `iter()` to start requiring __iter__ be defined if we decided to go down the "remove the __aiter__ requirement" from aiter() last week.
BTW all of this applies to async iterators as well. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/3W7TDX5K... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
I think there is also a distinction about the *current* meaning of "required" to be made, in "[i]terators are required to have an |__iter__()| <https://docs.python.org/3/reference/datamodel.html#object.__iter__> method": "required" doesn't specify whether this is: 1. by convention, and doing otherwise is just some form of undefined behaviour; for a human (or perhaps type-checker) reading it to think it's an iterator, it needs `__iter__`, but it's really something like passing an object of the wrong type to an unbound method - unenforced by the language (it used to be illegal in Py2) 2. in some way actually enforced: the iterator is required to have `__iter__` that returns self, and While 1 is clearly what actually happens in CPython, was that the intended meaning? I'd think so - 1 is still a perfectly acceptable interpretation of "required" (even if "required" isn't the most clear way of expressing it). Even if it wasn't the original meaning, that's how I think it should now be interpreted because that's what it is de facto. Do we know who originally wrote that line, so we could ask them? (The furthest I've traced it is https://github.com/python/cpython/commit/f10aa9825e49e8652f30bc6d92c736fe47b... but I don't have any knowledge of SVN or CVS (whichever was used at the time) to go further.) Also, any user-defined iterator that doesn't also define __iter__ would be considered wrong and nobody would refuse to fix that. If it's already a bug anyway, why bother changing the behaviour and check that?
A. It's not an iterator if it doesn't define `__next__`.
B. It is strongly recommended that iterators also define `__iter__`.
In "standards" language, I think (A) is MUST and (B) is merely OUGHT or maybe SHOULD.
On Tue, Sep 14, 2021 at 12:30 PM Brett Cannon <brett@python.org <mailto:brett@python.org>> wrote:
Over in https://github.com/python/typeshed/issues/6030 <https://github.com/python/typeshed/issues/6030> I have managed to kick up a discussion over what exactly an "iterator" is. If you look at https://docs.python.org/3/library/functions.html#iter <https://docs.python.org/3/library/functions.html#iter> you will see the docs say it "Return[s] an iterator <https://docs.python.org/3/glossary.html#term-iterator> object." Great, but you go the glossary definition of "iterator" at https://docs.python.org/3/glossary.html#term-iterator <https://docs.python.org/3/glossary.html#term-iterator> you will see it says "[i]terators are required to have an |__iter__()| <https://docs.python.org/3/reference/datamodel.html#object.__iter__> method" which neither `for` nor `iter()` actually enforce.
Is there something to do here? Do we loosen the definition of "iterator" to say they /should/ define __iter__? Leave it as-is with an understanding that we know that it's technically inaccurate for iter() but that we want to encourage people to define __iter__? I'm assuming people don't want to change `for` and `iter()` to start requiring __iter__ be defined if we decided to go down the "remove the __aiter__ requirement" from aiter() last week.
BTW all of this applies to async iterators as well.
Patrick
On Tue, Sep 14, 2021 at 12:33:32PM -0700, Guido van Rossum wrote:
My view of this is:
A. It's not an iterator if it doesn't define `__next__`.
B. It is strongly recommended that iterators also define `__iter__`.
In "standards" language, I think (A) is MUST and (B) is merely OUGHT or maybe SHOULD.
That's not what the docs say :-) https://docs.python.org/3/library/stdtypes.html#iterator-types Part of the problem is that there are two kinds of thing that we call "iterator": 1. Objects that we implicitly or explicitly pass to `iter()` in order to return an interator object; they only need to define an `__iter__` method that returns the actual iterator object itself. (That's a slight simplification, because iter() will fall back on the Sequence Protocol if `__iter__` isn't defined. But to my mind, that makes Sequence Protocol objects *iterables* not iterators.) 2. Iterator objects themselves, which are defined by a protocol, not a type. The iterator object MUST define both `__iter__` and `__next__`, and the `__iter__` method MUST return self. -- Steve
On Tue, Sep 14, 2021 at 9:03 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Tue, Sep 14, 2021 at 12:33:32PM -0700, Guido van Rossum wrote:
My view of this is:
A. It's not an iterator if it doesn't define `__next__`.
B. It is strongly recommended that iterators also define `__iter__`.
In "standards" language, I think (A) is MUST and (B) is merely OUGHT or maybe SHOULD.
That's not what the docs say :-)
https://docs.python.org/3/library/stdtypes.html#iterator-types
Huh, so it does. And in very clear words as well. I still don't think this should be enforced by checks for the presence of __iter__ in situations where it's not going to be called (e.g. in iter() itself and in "for x in it"). But since this is a longstanding convention and matches collections.abc.Iterator (and typing.Iterator) we might as well *document* it to be the case.
Part of the problem is that there are two kinds of thing that we call "iterator":
1. Objects that we implicitly or explicitly pass to `iter()` in order to return an interator object; they only need to define an `__iter__` method that returns the actual iterator object itself.
No, we don't call that an iterator. That's an *Iterable*. In this the docs you point to are actually weak: - It doesn't use the term Iterable at all but describe it as "container objects" or "containers". - It says " Sequences, described below in more detail, always support the iteration methods." That's wrong, or at the very least misleading, since a sequence itself *only* supports __iter__ -- it's the Iterator returned by s.__iter__() that supports __next__.
(That's a slight simplification, because iter() will fall back on the Sequence Protocol if `__iter__` isn't defined. But to my mind, that makes Sequence Protocol objects *iterables* not iterators.)
Right, it's wrong.
2. Iterator objects themselves, which are defined by a protocol, not a type. The iterator object MUST define both `__iter__` and `__next__`, and the `__iter__` method MUST return self.
So you say. I will compromise and agree that Iterators MUST have __next__ and SHOULD have __iter__ returning self. The distinction is that without __next__ it's not an Iterator. But without __iter__ it's merely a broken Iterator (that nevertheless works in most situations). -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
On 9/15/2021 12:33 AM, Guido van Rossum wrote:
On Tue, Sep 14, 2021 at 9:03 PM Steven D'Aprano <steve@pearwood.info <mailto:steve@pearwood.info>> wrote:
On Tue, Sep 14, 2021 at 12:33:32PM -0700, Guido van Rossum wrote: > My view of this is: > > A. It's not an iterator if it doesn't define `__next__`. > > B. It is strongly recommended that iterators also define `__iter__`. > > In "standards" language, I think (A) is MUST and (B) is merely OUGHT or > maybe SHOULD.
That's not what the docs say :-)
https://docs.python.org/3/library/stdtypes.html#iterator-types <https://docs.python.org/3/library/stdtypes.html#iterator-types>
Like Steven, I consider 'iterators are iterables' to be a very positive feature.
Huh, so it does. And in very clear words as well. I still don't think this should be enforced by checks for the presence of __iter__ in situations where it's not going to be called (e.g. in iter() itself and in "for x in it").
I agree with this also as I consider 'duck typing' (delayed type checking by use) and 'consenting adults' (break rules at one's own risk) to also be features. If iter were to check for __iter__ on the return object, it might as well call it to see if it returns the same object. That might be appropriate for a 'SargentPython' implementation, but, to me, not for CPython. -- Terry Jan Reedy
I understood that _iterables_ are required to have an __iter__ method, not iterators. Therefore, are we simply discussing whether all iterators should be iterable? At the moment the CPython implementation does't require that AFAIK. regards Steve On Tue, Sep 14, 2021 at 8:39 PM Guido van Rossum <guido@python.org> wrote:
My view of this is:
A. It's not an iterator if it doesn't define `__next__`.
B. It is strongly recommended that iterators also define `__iter__`.
In "standards" language, I think (A) is MUST and (B) is merely OUGHT or maybe SHOULD.
On Tue, Sep 14, 2021 at 12:30 PM Brett Cannon <brett@python.org> wrote:
Over in https://github.com/python/typeshed/issues/6030 I have managed to kick up a discussion over what exactly an "iterator" is. If you look at https://docs.python.org/3/library/functions.html#iter you will see the docs say it "Return[s] an iterator <https://docs.python.org/3/glossary.html#term-iterator> object." Great, but you go the glossary definition of "iterator" at https://docs.python.org/3/glossary.html#term-iterator you will see it says "[i]terators are required to have an __iter__() <https://docs.python.org/3/reference/datamodel.html#object.__iter__> method" which neither `for` nor `iter()` actually enforce.
Is there something to do here? Do we loosen the definition of "iterator" to say they *should* define __iter__? Leave it as-is with an understanding that we know that it's technically inaccurate for iter() but that we want to encourage people to define __iter__? I'm assuming people don't want to change `for` and `iter()` to start requiring __iter__ be defined if we decided to go down the "remove the __aiter__ requirement" from aiter() last week.
BTW all of this applies to async iterators as well. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/3W7TDX5K... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...> _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/OICGRBPL... Code of Conduct: http://python.org/psf/codeofconduct/
On Sun, Sep 19, 2021 at 8:15 AM Steve Holden <steve@holdenweb.com> wrote:
I understood that _iterables_ are required to have an __iter__ method, not iterators.
Therefore, are we simply discussing whether all iterators should be iterable?
At this point it's more about how to document this.
At the moment the CPython implementation does't require that AFAIK.
Correct. I plan to go through the docs and clarify things. I opened https://bugs.python.org/issue45250 to track this.
regards Steve
On Tue, Sep 14, 2021 at 8:39 PM Guido van Rossum <guido@python.org> wrote:
My view of this is:
A. It's not an iterator if it doesn't define `__next__`.
B. It is strongly recommended that iterators also define `__iter__`.
In "standards" language, I think (A) is MUST and (B) is merely OUGHT or maybe SHOULD.
On Tue, Sep 14, 2021 at 12:30 PM Brett Cannon <brett@python.org> wrote:
Over in https://github.com/python/typeshed/issues/6030 I have managed to kick up a discussion over what exactly an "iterator" is. If you look at https://docs.python.org/3/library/functions.html#iter you will see the docs say it "Return[s] an iterator <https://docs.python.org/3/glossary.html#term-iterator> object." Great, but you go the glossary definition of "iterator" at https://docs.python.org/3/glossary.html#term-iterator you will see it says "[i]terators are required to have an __iter__() <https://docs.python.org/3/reference/datamodel.html#object.__iter__> method" which neither `for` nor `iter()` actually enforce.
Is there something to do here? Do we loosen the definition of "iterator" to say they *should* define __iter__? Leave it as-is with an understanding that we know that it's technically inaccurate for iter() but that we want to encourage people to define __iter__? I'm assuming people don't want to change `for` and `iter()` to start requiring __iter__ be defined if we decided to go down the "remove the __aiter__ requirement" from aiter() last week.
BTW all of this applies to async iterators as well. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/3W7TDX5K... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...> _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/OICGRBPL... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/KHDMNMW6... Code of Conduct: http://python.org/psf/codeofconduct/
I think it's also worth noting that a missing "`__iter__` that returns self" is trivial to recover from... just use a new reference to the iterator instead. The overhead of a method call for this convention almost seems silly. What worries me most about changing the current "requirement" is that it may create either confusion or backward compatibility issues for `collections.abc.Iterator` (which is a subtype of `Iterable`, and thus requires `__iter__`).
On Tue, Sep 14, 2021 at 3:49 PM Brandt Bucher <brandtbucher@gmail.com> wrote:
I think it's also worth noting that a missing "`__iter__` that returns self" is trivial to recover from... just use a new reference to the iterator instead. The overhead of a method call for this convention almost seems silly.
The use case is this: def foo(it): for x in it: print(x) def main(): it = iter([1, 2, 3]) next(it) foo(it) Since "for x in it" calls iter(it), if the argument is an iterator that doesn't define __iter__, it would fail. But this is all about convention -- we want to make it convenient to do this kind of thing, so all standard iterators define __iter__ as well as __next__.
What worries me most about changing the current "requirement" is that it may create either confusion or backward compatibility issues for `collections.abc.Iterator` (which is a subtype of `Iterable`, and thus requires `__iter__`).
If you explicitly inherit from Iterator, you inherit a default implementation of __iter__ (that returns self, of course). If you merely register, it's up to you to comply. And sometimes people register things that don't follow the letter of the protocol, just to get things going. (This is common for complex protocols like Mapping, where some function you have no control over insists on a Mapping but only calls one or two common methods. Duck typing is alive and kicking! -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
Guido van Rossum wrote:
On Tue, Sep 14, 2021 at 3:49 PM Brandt Bucher brandtbucher@gmail.com wrote:
I think it's also worth noting that a missing "`__iter__` that returns self" is trivial to recover from... just use a new reference to the iterator instead. The overhead of a method call for this convention almost seems silly. The use case is this:
Yeah, I understand that. But what I'm hinting that is that the `GET_ITER` opcode and `iter` builtin *could* gracefully handle this situation when called on something that doesn't define `__iter__` but does define `__next__`. Pseudocode: def iter(o): if hasattr(o, "__iter__"): return o.__iter__() elif hasattr(o, "__next__"): # Oh well, o.__iter__() would have just returned o anyways... return o raise TypeError This would be implemented at the lowest possible level, in `PyObject_GetIter`.
What worries me most about changing the current "requirement" is that it may create either confusion or backward compatibility issues for `collections.abc.Iterator` (which is a subtype of `Iterable`, and thus requires `__iter__`). If you explicitly inherit from Iterator, you inherit a default implementation of __iter__ (that returns self, of course). If you merely register, it's up to you to comply. And sometimes people register things that don't follow the letter of the protocol, just to get things going. (This is common for complex protocols like Mapping, where some function you have no control over insists on a Mapping but only calls one or two common methods.
Yeah, I was thinking about cases like `isinstance(o, Iterator)`, where `o` defines `__iter__` but not `__next__`. Even though this code might start returning the "right" answer, it's still a backward-compatibility break. Not sure what the true severity would be, though...
On Tue, Sep 14, 2021 at 4:33 PM Brandt Bucher <brandtbucher@gmail.com> wrote:
Guido van Rossum wrote:
On Tue, Sep 14, 2021 at 3:49 PM Brandt Bucher brandtbucher@gmail.com wrote:
I think it's also worth noting that a missing "`__iter__` that returns self" is trivial to recover from... just use a new reference to the iterator instead. The overhead of a method call for this convention almost seems silly. The use case is this:
Yeah, I understand that. But what I'm hinting that is that the `GET_ITER` opcode and `iter` builtin *could* gracefully handle this situation when called on something that doesn't define `__iter__` but does define `__next__`. Pseudocode:
def iter(o): if hasattr(o, "__iter__"): return o.__iter__() elif hasattr(o, "__next__"): # Oh well, o.__iter__() would have just returned o anyways... return o raise TypeError
This would be implemented at the lowest possible level, in `PyObject_GetIter`.
That seems like violating the Zen: "Errors should never pass silently." It would certainly have a ripple effect, since everyone who currently defines a __iter__ (in C or Python) that returns self would want to remove it, and documentation would need to be updated everywhere. I don't see this issue as important enough to do that. There are also probably multiple things that emulate iter() that would have to be updated to match, if builtin iter() starts changing its behavior. TBH I don't think there is an *actual* problem here. I think it's just about choosing the right wording for the glossary (which IMO does not have status as a source of truth anyway).
What worries me most about changing the current "requirement" is that it may create either confusion or backward compatibility issues for `collections.abc.Iterator` (which is a subtype of `Iterable`, and thus requires `__iter__`). If you explicitly inherit from Iterator, you inherit a default implementation of __iter__ (that returns self, of course). If you merely register, it's up to you to comply. And sometimes people register things that don't follow the letter of the protocol, just to get things going. (This is common for complex protocols like Mapping, where some function you have no control over insists on a Mapping but only calls one or two common methods.
Yeah, I was thinking about cases like `isinstance(o, Iterator)`, where `o` defines `__iter__` but not `__next__`.
(Did you mean the other way around? __iter__ without next is an Iterable but not an Iterator. And isinstance() returns the right answer for this.)
Even though this code might start returning the "right" answer, it's still a backward-compatibility break. Not sure what the true severity would be, though...
The ABC Iterator does not define the concept Iterator though. And static type checking is not meant to exactly follow all the rules of the language anyway -- there are many approximations being made by static type checkers. Regarding the meaning of "requires", not all requirements are checked at runtime either. But I expect we won't be able to make everyone happy here. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
Guido van Rossum wrote:
TBH I don't think there is an *actual* problem here. I think it's just about choosing the right wording for the glossary (which IMO does not have status as a source of truth anyway).
Good point. I'm probably approaching this from the wrong angle (by trying to "fix" the language, rather than the docs).
If it helps, I have tons of code that tests for iterators using: iter(obj) is obj That has been a documented requirement for the iterator protocol forever. Its in the PEP. "A class that wants to be an iterator should implement two methods: a next() method that behaves as described above, and an __iter__() method that returns self." https://www.python.org/dev/peps/pep-0234/ We have objects such that: iter(obj) returns an iterator, but aren't themselves iterators. The most common example of that would be, I think, classes that define __iter__ as a generator method: class A: def __iter__(self): for x in range(10): yield x Then we have actual iterators, like iter(A()). They define `__iter__` that returns self. I don't know what I would call an object that only has __next__, apart from "broken" :-( -- Steve
On Tue, Sep 14, 2021 at 9:31 PM Steven D'Aprano <steve@pearwood.info> wrote:
If it helps, I have tons of code that tests for iterators using:
iter(obj) is obj
That has been a documented requirement for the iterator protocol forever. Its in the PEP.
"A class that wants to be an iterator should implement two methods: a next() method that behaves as described above, and an __iter__() method that returns self."
However, the description clarifies that the reason for requiring __iter__ is weaker than the reason for requiring __next__.
We have objects such that:
iter(obj)
returns an iterator, but aren't themselves iterators.
Yeah, those are Iterables.
The most common example of that would be, I think, classes that define __iter__ as a generator method:
class A: def __iter__(self): for x in range(10): yield x
Then we have actual iterators, like iter(A()). They define `__iter__` that returns self.
I don't know what I would call an object that only has __next__, apart from "broken" :-(
It's still an iterator, since it duck-types in most cases where an iterator is required (notably "for", which is the primary use case for the iteration protocols -- it's in the first sentence of PEP 234's abstract). -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
On Tue, Sep 14, 2021 at 09:38:38PM -0700, Guido van Rossum wrote:
I don't know what I would call an object that only has __next__, apart from "broken" :-(
It's still an iterator, since it duck-types in most cases where an iterator is required (notably "for", which is the primary use case for the iteration protocols -- it's in the first sentence of PEP 234's abstract).
I don't think it duck-types as an iterator. Here's an example: class A: def __init__(self): self.items = [1, 2, 3] def __next__(self): try: return self.items.pop() except IndexError: raise StopIteration class B: def __iter__(self): return A() It's fine to iterate over B() directly, but you can't iterate over A() at all. If you try, you get a TypeError: >>> for item in A(): pass ... Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'A' object is not iterable In practice, this impacts some very common techniques. For instance, pre-calling iter() on your input. >>> x = B() >>> it = iter(x) >>> for value in it: pass ... Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'A' object is not iterable There are all sorts of reasons why one might pre-call iter(). One common one is to pre-process the first element: it = iter(obj) first = next(obj, None) for item in it: ... Another is to test for an iterable. iter(obj) will raise TypeError if obj is not a sequence, collection, iterator, iterable etc. Another is to break out of one loop and then run another: it = iter(obj) for x in it: if condition: break do_something() for x in it: something_else() I'm sure there are others I haven't thought of. I believe that iterable objects that define `__next__` but not `__iter__` are fundamentally broken. If they happen to work in some circumstances but not others, that's because the iterator protocol is relaxed enough to work with broken iterators :-) -- Steve
On Tue, Sep 14, 2021 at 11:44 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Tue, Sep 14, 2021 at 09:38:38PM -0700, Guido van Rossum wrote:
I don't know what I would call an object that only has __next__, apart from "broken" :-(
It's still an iterator, since it duck-types in most cases where an iterator is required (notably "for", which is the primary use case for the iteration protocols -- it's in the first sentence of PEP 234's abstract).
I don't think it duck-types as an iterator. Here's an example:
class A: def __init__(self): self.items = [1, 2, 3] def __next__(self): try: return self.items.pop() except IndexError: raise StopIteration
class B: def __iter__(self): return A()
It's fine to iterate over B() directly, but you can't iterate over A() at all. If you try, you get a TypeError:
>>> for item in A(): pass ... Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'A' object is not iterable
Yes, we all understand that. The reason I invoked "duck typing" is that as long as you don't use the iterator in a situation where iter() is called on it, it works fine. Just like a class with a readline() method works fine in some cases where a file is expected.
In practice, this impacts some very common techniques. For instance, pre-calling iter() on your input.
>>> x = B() >>> it = iter(x) >>> for value in it: pass ... Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'A' object is not iterable
There are all sorts of reasons why one might pre-call iter(). One common one is to pre-process the first element:
it = iter(obj) first = next(obj, None) for item in it: ...
Another is to test for an iterable. iter(obj) will raise TypeError if obj is not a sequence, collection, iterator, iterable etc.
Another is to break out of one loop and then run another:
it = iter(obj) for x in it: if condition: break do_something()
for x in it: something_else()
I'm sure there are others I haven't thought of.
No-one is arguing that an iterator that doesn't define __iter__ is great. And the docs should continue to recommend strongly to add an __iter__ method returning self. My only beef is with over-zealous people who might preemptively want to reject an iterator at runtime that only has __next__; in particular "for" and iter() have no business checking for this attribute ("for" only needs __next__, and iter() only should check for the minimal version of the protocol to reject things without __next__).
I believe that iterable objects that define `__next__` but not `__iter__` are fundamentally broken. If they happen to work in some circumstances but not others, that's because the iterator protocol is relaxed enough to work with broken iterators :-)
Your opinion is loud and clear. I just happen to disagree. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
Guido:
It's still an iterator, since it duck-types in most cases where an iterator is required (notably "for", which is the primary use case for the iteration protocols -- it's in the first sentence of PEP 234's abstract).
D'Aprano:
I don't think it duck-types as an iterator. Here's an example:
class A: def __init__(self): self.items = [1, 2, 3] def __next__(self): try: return self.items.pop() except IndexError: raise StopIteration
for item in A(): pass ... Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'A' object is not iterable
Guido:
Yes, we all understand that. The reason I invoked "duck typing" is that as long as you don't use the iterator in a situation where iter() is called on it, it works fine.
I'm confused. - a "broken" iterator should be usable in `for`; - `A` is a broken iterator; but - `A()` is not usable in `for`. What am I missing? -- ~Ethan~
On Wed, Sep 15, 2021 at 3:54 PM Ethan Furman <ethan@stoneleaf.us> wrote:
Guido:
It's still an iterator, since it duck-types in most cases where an iterator is required (notably "for", which is the primary use case for the iteration protocols -- it's in the first sentence of PEP 234's abstract).
D'Aprano:
I don't think it duck-types as an iterator. Here's an example:
class A: def __init__(self): self.items = [1, 2, 3] def __next__(self): try: return self.items.pop() except IndexError: raise StopIteration
for item in A(): pass ... Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'A' object is not iterable
Guido:
Yes, we all understand that. The reason I invoked "duck typing" is that as long as you don't use the iterator in a situation where iter() is called on it, it works fine.
I'm confused.
- a "broken" iterator should be usable in `for`; - `A` is a broken iterator;
but
- `A()` is not usable in `for`.
What am I missing?
Steven's class A is the kind of class a custom sequence might return from its __iter__ method. E.g. class S: def __iter__(self): return A() Now this works: for x in S(): ... However this doesn't: for x in iter(S()): ... In Steven's view, A does not deserve to work in the former case: Because A is a "broken" iterator, he seems to want it rejected by the iter() call that is *implicit* in the for-loop. Reminder about how for-loops work: This: for x in seq: <body> translates (roughly) to this: _it = iter(seq) while True: try: x = next(_it) except StopIteration: break <body> -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
Note: I am all for not enforcing anything here -- let's keep duck typing alive! If static type checkers want to be more pedantic, they can be -- that's kinda what they are for :-) But the OP wrote: """ "[i]terators are required to have an __iter__() <https://docs.python.org/3/reference/datamodel.html#object.__iter__> method" which neither `for` nor `iter()` actually enforce. """ I'm confused -- as far as I can tell `for` does enforce this -- well, it doesn't enforce it, but it does require it, which is the same thing, yes? But does it need to? On Wed, Sep 15, 2021 at 4:07 PM Guido van Rossum <guido@python.org> wrote:
Reminder about how for-loops work:
This:
for x in seq: <body>
translates (roughly) to this:
_it = iter(seq) while True: try: x = next(_it) except StopIteration: break <body>
exactly -- that call to iter is always made, yes? The "trick" here is that we want it to be easy to use a for loop with either an iterable or an iterator. Otherwise, we would require people to write: for i in iter(a_sequence): ... which I doubt anyone would want, backward compatibility aside. And since iter() is going to always get called, we need __iter__ methods that return self. However, I suppose one could do a for loop something like this instead. _it = seq while True: try: x = next(_it) except TypeError: _it = iter(_it) x = next(_it) except StopIteration: break <body> That is, instead of making every iterator an iterable, keep the two concepts more distinct: An "Iterator" has a __next__ method that returns an item or raises StopIteration. An "Iterable" has an __iter__ method that returns an iterator. That would mean that one couldn't write a single class that is both an iterable and an iterator, and uses (abuses) __iter__ to reset itself. But would that be a bad thing? Anyway, this is just a mental exercise, I am not suggesting changing anything. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Thu, 16 Sept 2021 at 01:30, Chris Barker via Python-Dev <python-dev@python.org> wrote:
""" "[i]terators are required to have an __iter__() method" which neither `for` nor `iter()` actually enforce. """
I'm confused -- as far as I can tell `for` does enforce this -- well, it doesn't enforce it, but it does require it, which is the same thing, yes? But does it need to?
for enforces that *iterables* have an __iter__ method, not *iterators*. (for takes an iterable, not an iterator, and uses __iter__ to *get* an iterator from it). The debate here is (I think!) whether an *iterator* that is not also an *iterable* is a valid iterator. IMO it is valid (because that's what the definitions say, basically) but it may not be *useful* in certain circumstances, and it definitely may not be *expected* (because nearly all iterators are iterables). "Broken" is a strong word to use, though, and that might be why the debate is continuing this long... Paul
On 9/16/2021 3:02 AM, Paul Moore wrote:
The debate here is (I think!) whether an *iterator* that is not also an *iterable* is a valid iterator.
This framing of the question seems biased in that it initially uses 'iterator' to mean 'object with __next__ but not __iter__' whe the propriety of that equating is at least half of the debate.
IMO it is valid (because that's what the definitions say, basically)
The definitions pretty much answer the question above in the negative. https://www.python.org/dev/peps/pep-0234/ C-API: "Iterators ought to implement the tp_iter slot as returning a reference to themselves; this is needed to make it possible to use an iterator (as opposed to a sequence) in a for loop." Python-API" " A class that wants to be an iterator should implement two methods: a next() method that behaves as described above, and an __iter__() method that returns self." ... "Iterators are currently required to support both protocols." The clear intention is that iterators be usable as iterables. https://docs.python.org/3/glossary.html iterator: " Iterators are required to have an __iter__() method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted."
but it may not be *useful* in certain circumstances, and it definitely may not be *expected* (because nearly all iterators are iterables). "Broken" is a strong word to use, though, and that might be why the debate is continuing this long...
I think 'semi-iterator' might be a better term, definitely more neutral, for an object that is maybe duck-type usable as an iterator and maybe not. For Python code, I currently do not see a reason to omit the minimal "def __init__(self): return self". I don't know about C code. -- Terry Jan Reedy
On Wed, Sep 15, 2021 at 04:01:31PM -0700, Guido van Rossum wrote:
Steven's class A is the kind of class a custom sequence might return from its __iter__ method. E.g.
class S: def __iter__(self): return A()
Correct, where A itself has a `__next__` method but no `__iter__` method.
Now this works:
for x in S(): ...
Agreed.
However this doesn't:
for x in iter(S()): ...
Correct, but *in practice* nobody would actually write it like that, since that would be silly. But what can happen is that one might have earlier called iter() directly, and only afterwards used the result in a for loop. it = iter(S()) # assert isinstance(it, A) ... for x in it: ... Or we can short-cut the discussion and just write it like this: for x in A(): ... which clearly fails because A has no `__iter__` method. When we write it like that, it is clear that A is not an iterator. The waters are only muddied because *most of the time* we don't write it like that, we do the simplest thing that can work: for x in S(): ... which does work. So the question is, in that last snippet, the version that *does* work, what are we iterating over? Are we iterating over S() or A()? I think the answer is Yes :-)
In Steven's view, A does not deserve to work in the former case: Because A is a "broken" iterator, he seems to want it rejected by the iter() call that is *implicit* in the for-loop.
No, I'm not arguing that. 1. It's not a matter of "deserves", it is that A instances cannot be used *directly* in a for loop, because they have no `__iter__` method. 2. I don't want iter() or the for loop to reject *S* instances just because A instances don't have `__iter__`. 3. I don't need to propose that for loops reject A instances, since they already do that. That's the status quo, and it's working correctly according to the iterator protocol. The bottom line here is that I'm not asking for any runtime changes here at all. Perhaps improving the docs would be a good thing, and honestly I'm unsure what typeshed should do. I suppose that depends on whether you see the role of static type checking to be as strict as possible or as forgiving as possible. If you want your type checking to be strict, then maybe you want it to flag A as not an iterator. If you want it to accept anything that works, maybe you want it to allow S as an iterator. On the typeshed issue, Akuli comments that they have a policy of preferring false negatives. So I think that nothing needs to be done? https://github.com/python/typeshed/issues/6030#issuecomment-918544344 -- Steve
On Wed, Sep 15, 2021 at 4:06 PM Guido van Rossum <guido@python.org> wrote:
[SNIP] Reminder about how for-loops work:
This:
for x in seq: <body>
translates (roughly) to this:
_it = iter(seq) while True: try: x = next(_it) except StopIteration: break <body>
And if anyone wants more details on this, I have a blog post about it at https://snarky.ca/unravelling-for-statements/ .
On Wed, Sep 15, 2021 at 08:57:58AM -0700, Guido van Rossum wrote: [...]
Yes, we all understand that. The reason I invoked "duck typing" is that as long as you don't use the iterator in a situation where iter() is called on it, it works fine. Just like a class with a readline() method works fine in some cases where a file is expected.
Okay, you've convinced me that perhaps duck typing is an appropriate term to use. But I hope we wouldn't be arguing that a class with only a readline() method *is* a file object and changing the docs to support that view :-) [...]
No-one is arguing that an iterator that doesn't define __iter__ is great.
I'm arguing that it's not an iterator at all, even if you can use it in place of an iterator under some circumstances. As you pointed out, there is already a name for that: iterable.
And the docs should continue to recommend strongly to add an __iter__ method returning self.
Agreed. That's part of the iterator protocol. If some objects don't need to support the full iterator protocol in order to get the job done, then that's great, and people should be allowed to support only the part of the protocol they need.
My only beef is with over-zealous people who might preemptively want to reject an iterator at runtime that only has __next__; in particular "for" and iter() have no business checking for this attribute ("for" only needs __next__, and iter() only should check for the minimal version of the protocol to reject things without __next__).
Again, I agree. `for` and iter() should only check for the minimum of what they need.
I believe that iterable objects that define `__next__` but not `__iter__` are fundamentally broken. If they happen to work in some circumstances but not others, that's because the iterator protocol is relaxed enough to work with broken iterators :-)
Your opinion is loud and clear. I just happen to disagree.
I think we're in violent agreement here :-) Obligatory Argument Sketch video: https://www.youtube.com/watch?v=ohDB5gbtaEQ -- Steve
FYI I opened https://github.com/python/cpython/pull/29170 to loosen/correct the definition of "iterator", but I got push-back on the PR and this thread never reached a clear conclusion. As such I'll ask the SC to make a call.
participants (10)
-
Brandt Bucher
-
Brett Cannon
-
Chris Barker
-
Ethan Furman
-
Guido van Rossum
-
Patrick Reader
-
Paul Moore
-
Steve Holden
-
Steven D'Aprano
-
Terry Reedy