generator vs iterator etc. (was: How assignment should work with generators?)

On Mon, Nov 27, 2017 at 3:55 PM, Steven D'Aprano <steve@pearwood.info> wrote:
I can see where this is coming from. The thing is that "iterator" and "generator" are mostly synonymous, except two things: (1) Generators are iterators that are produced by a generator function (2) Generator functions are sometimes referred to as just "generators" The concept of "generator" thus overlaps with both "iterator" and "generator function". Then there's also "iterator" and "iterable", which are two different things: (3) If `obj` is an *iterable*, then `it = iter(obj)` is an *iterator* (over the contents of `obj`) ( 4) Iterators yield values, for example on explicit calls to next(it). Personally I have leaned towards keeping a clear distinction between "generator function" and "generator", which leads to the situation that "generator" and "iterator" are mostly synonymous for me. Sometimes, for convenience, I use the term "generator" to refer to "iterators" more generally. This further seems to have a minor benefit that "generators" and "iterables" are less easily confused with each other than "iterators" and "iterables". I thought about this issue some time ago for the `views` package, which has a separation between sequences (seq) and other iterables (gen): https://github.com/k7hoven/views The functionality provided by `views.gen` is not that interesting—it's essentially a subset of itertools functionality, but with an API that parallels `views.seq` which works with sequences (iterable, sliceable, chainable, etc.). I used the name `gen`, because iterator/iterable variants of the functionality can be implemented with generator functions (although also with other kinds of iterators/iterables). Calling the thing `iter` would have conflicted with the builtin `iter`. HOWEVER, this naming can be confusing for those that lean more towards using "generator" to also mean "generator function", and for those that are comfortable with the term "iterator" despite its resemblance to "iterable". Now I'm actually seriously considering to consider renaming `views.gen` to ` views.iter` when I have time. After all, there's already `views.range` which "conflicts" with the builtin range. Anyway, the point is that the naming is suboptimal. SOLUTION: Maybe (a) all iterators should be called iterators or (b) all iterators should be called generators, regardless of whether they are somehow a result of a generator function having been called in the past. (I'm not going into the distinction between things that can receive values via `send` or any other possible distinctions between different types of iterators and iterables.) —Koos (discussion originated from python-ideas, but cross-posted to python-dev in case there's more interest there) -- + Koos Zevenhoven + http://twitter.com/k7hoven +

On Mon, Nov 27, 2017 at 06:35:38PM +0200, Koos Zevenhoven wrote:
SOLUTION: Maybe (a) all iterators should be called iterators
All iterators *are* called iterators. Just as all mammals are called "mammals". The subset of iterators which are created as generators are *also* called generators, just as the mammals which are dogs are called "dogs" when it is necessary to distinguish a dog from some other mammal.
Absolutely not. That would be confusing -- it would be analogous to calling all sequences (lists, tuples, deques etc) "strings". What benefit is there to calling all iterators "generators", losing the distinction between those which are defined using def and yield and those created using iter()? Sometimes that distinction is important. You are right that sometimes the term "generator" is used as shorthand for "generator function". Most of the time the distinction doesn't actually matter, since you cannot (easily?) create a generator without first creating a generator function. Or if it does matter, it is clear in context which is meant. For those few times where it *does* matter, there is no substitute for precision in language, and that depends on the author, not the terminology. -- Steve

Steven D'Aprano writes:
The subset of iterators which are created as generators are *also* called generators,
As long as we're being precise, I don't think that is precisely correct: >>> (x for x in range(1)) <generator object <genexpr> at 0x10dee5e08> >>> iter(range(1)) <range_iterator object at 0x10dab83f0> >>> iter((1,)) <tuple_iterator object at 0x10df109b0> The two iterators have the same duck-type, the generator is different. A generator (object) is, of course, an interable.
You are right that sometimes the term "generator" is used as shorthand for "generator function".
I've always thought "generator factory" would be a better term, but "generator function" will do. I generally use "generator object" to make the distinction, though.
At least you can create a generator (object) with the generator function created and called implicitly by using a generator expression. Reverting from pedantic mode. Hear, hear! this:
Steve

On Tue, Nov 28, 2017 at 03:11:25PM +0900, Stephen J. Turnbull wrote:
How is the generator different? It quacks like a range_iterator and tuple_iterator, it swims like them, it flies like them. Is there some iterator method or protocol that generators don't support?
A generator (object) is, of course, an interable.
And also an iterator: py> collections.abc py> isinstance((x+1 for x in range(5)), collections.abc.Iterator) True
Ah yes, I forget about generator expressions, thanks. -- Steve

Given that we have this kind of arcane discussion fairly regularly (not just in this thread), and it always makes my head spin, and it seems I'm not the only one who gets confused: How about having a module that provides functions such as isgenerator isiterator isiterable etc. or alternatively one function that would return a tuple/list of categories that an object fell into e.g. ('iterator', 'iterable') or a dictionary e.g. { 'iterator' : True, 'iterable' : True, 'generator' : False }. (Bikeshed as appropriate, but providing a dict seems to make it easier to add more things in future without breaking backward compatibility.) Then those of us who are prepared to take care to be precise in our language but could do with some help could use it to clarify our thoughts. And there should be less noise in the newsgroups from pointless arguments about precisely what is what. And I suspect it would even have uses in "real" code. Rob Cliffe On 28/11/2017 06:22, Steven D'Aprano wrote:

On Tue, Nov 28, 2017 at 04:25:23PM +0000, Rob Cliffe wrote:
There is no single module that does this, but the inspect module comes close: inspect.isgenerator inspect.isgeneratorfunction will tell you the difference between these two: def gen_function(): yield 1 generator = gen_function() The collections.abc module has ABCs that you can use with isinstance: collections.abc.Iterable collections.abc.Iterator collections.abc.Sequence For example, we know that range is not a generator but is a sequence: py> inspect.isgenerator(range(10)) False py> isinstance(range(10), collections.abc.Sequence) True -- Steve

Steven D'Aprano writes:
The two iterators have the same duck-type, the generator is different.
How is the generator different?
My bad, I got the comparison backward. The generator *is* different, but it's because the generator has *extra* public methods, not fewer. Sorry for the noise.

On 28 November 2017 at 16:11, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
While it's not obvious with the genexp (since they're anonymous), the main reason for the difference in the repr layouts here is just because generator iterators can have names: >>> def g(): yield ... >>> g() <generator object g at 0x7f93e5e41258> So the statement that "generator iterators are iterators" is correct. The functions that create them are called generator functions because they really are functions: >>> g <function g at 0x7f93f17f1ea0> What's more unfortunate here is that the usage of "generator" in the generator-iterator representation doesn't actually align with the preferred terminology in the documentation: https://docs.python.org/3/glossary.html#term-generator So I can understand the confusion here. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Nov 27, 2017 at 06:35:38PM +0200, Koos Zevenhoven wrote:
SOLUTION: Maybe (a) all iterators should be called iterators
All iterators *are* called iterators. Just as all mammals are called "mammals". The subset of iterators which are created as generators are *also* called generators, just as the mammals which are dogs are called "dogs" when it is necessary to distinguish a dog from some other mammal.
Absolutely not. That would be confusing -- it would be analogous to calling all sequences (lists, tuples, deques etc) "strings". What benefit is there to calling all iterators "generators", losing the distinction between those which are defined using def and yield and those created using iter()? Sometimes that distinction is important. You are right that sometimes the term "generator" is used as shorthand for "generator function". Most of the time the distinction doesn't actually matter, since you cannot (easily?) create a generator without first creating a generator function. Or if it does matter, it is clear in context which is meant. For those few times where it *does* matter, there is no substitute for precision in language, and that depends on the author, not the terminology. -- Steve

Steven D'Aprano writes:
The subset of iterators which are created as generators are *also* called generators,
As long as we're being precise, I don't think that is precisely correct: >>> (x for x in range(1)) <generator object <genexpr> at 0x10dee5e08> >>> iter(range(1)) <range_iterator object at 0x10dab83f0> >>> iter((1,)) <tuple_iterator object at 0x10df109b0> The two iterators have the same duck-type, the generator is different. A generator (object) is, of course, an interable.
You are right that sometimes the term "generator" is used as shorthand for "generator function".
I've always thought "generator factory" would be a better term, but "generator function" will do. I generally use "generator object" to make the distinction, though.
At least you can create a generator (object) with the generator function created and called implicitly by using a generator expression. Reverting from pedantic mode. Hear, hear! this:
Steve

On Tue, Nov 28, 2017 at 03:11:25PM +0900, Stephen J. Turnbull wrote:
How is the generator different? It quacks like a range_iterator and tuple_iterator, it swims like them, it flies like them. Is there some iterator method or protocol that generators don't support?
A generator (object) is, of course, an interable.
And also an iterator: py> collections.abc py> isinstance((x+1 for x in range(5)), collections.abc.Iterator) True
Ah yes, I forget about generator expressions, thanks. -- Steve

Given that we have this kind of arcane discussion fairly regularly (not just in this thread), and it always makes my head spin, and it seems I'm not the only one who gets confused: How about having a module that provides functions such as isgenerator isiterator isiterable etc. or alternatively one function that would return a tuple/list of categories that an object fell into e.g. ('iterator', 'iterable') or a dictionary e.g. { 'iterator' : True, 'iterable' : True, 'generator' : False }. (Bikeshed as appropriate, but providing a dict seems to make it easier to add more things in future without breaking backward compatibility.) Then those of us who are prepared to take care to be precise in our language but could do with some help could use it to clarify our thoughts. And there should be less noise in the newsgroups from pointless arguments about precisely what is what. And I suspect it would even have uses in "real" code. Rob Cliffe On 28/11/2017 06:22, Steven D'Aprano wrote:

On Tue, Nov 28, 2017 at 04:25:23PM +0000, Rob Cliffe wrote:
There is no single module that does this, but the inspect module comes close: inspect.isgenerator inspect.isgeneratorfunction will tell you the difference between these two: def gen_function(): yield 1 generator = gen_function() The collections.abc module has ABCs that you can use with isinstance: collections.abc.Iterable collections.abc.Iterator collections.abc.Sequence For example, we know that range is not a generator but is a sequence: py> inspect.isgenerator(range(10)) False py> isinstance(range(10), collections.abc.Sequence) True -- Steve

Steven D'Aprano writes:
The two iterators have the same duck-type, the generator is different.
How is the generator different?
My bad, I got the comparison backward. The generator *is* different, but it's because the generator has *extra* public methods, not fewer. Sorry for the noise.

On 28 November 2017 at 16:11, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
While it's not obvious with the genexp (since they're anonymous), the main reason for the difference in the repr layouts here is just because generator iterators can have names: >>> def g(): yield ... >>> g() <generator object g at 0x7f93e5e41258> So the statement that "generator iterators are iterators" is correct. The functions that create them are called generator functions because they really are functions: >>> g <function g at 0x7f93f17f1ea0> What's more unfortunate here is that the usage of "generator" in the generator-iterator representation doesn't actually align with the preferred terminology in the documentation: https://docs.python.org/3/glossary.html#term-generator So I can understand the confusion here. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (5)
-
Koos Zevenhoven
-
Nick Coghlan
-
Rob Cliffe
-
Stephen J. Turnbull
-
Steven D'Aprano