On May 10, 2020, at 22:36, Stephen J. Turnbull turnbull.stephen.fw@u.tsukuba.ac.jp wrote:
Andrew Barnert via Python-ideas writes:
A lot of people get this confused. I think the problem is that we don’t have a word for “iterable that’s not an iterator”,
I think part of the problem is that people rarely see explicit iterator objects in the wild. Most of the time we encounter iterator objects only implicitly.
We encounter iterators in the wild all the time, we just don’t usually _care_ that they’re iterators instead of “some kind of iterable”, and I think that’s the key distinction you’re looking for.
Certainly when you open a file, you usually deal with the file object. And whenever you feed the result of one genexpr into another, or into a map call, you are using an iterator. You often even store those iterators in variables.
But if you change that first genexpr to a listcomp (say, because you want to be able to breakpoint there and print it to the debugger, or dump it to a log), nothing changes except performance. And people know this and take advantage of it without even thinking. And that’s true of the majority of places you use iterators. Code that explicitly needs an iterator (like the grouper idiom where you zip an iterator with itself) certainly does exist, but it’s nowhere near as common as code that can use any iterable and only uses an iterator because that’s the easiest thing to write or the most efficient thing.
This is a big part of what I meant about the concepts being so nice that people manage to use them despite not being able to talk about them.
Nomenclature *is* a problem (I still don't know what a "generator" is: a function that contains "yield" in its def, or the result of invoking such a function), but part of the reason for that is that Python successfully hides objects like iterators and generator objects much of the time (I use generator expressions a lot, "yield" rarely).
You’re right. The fact that the concept (and the implementation of those concepts) is so nice that we rarely have to think about these things explicit is actually part of the reason it’s hard to do so on the rare occasions we need to. And put that way, it’s a pretty good tradeoff.
Still, having clear names with simple definitions would help that problem without watering down the benefits.
or for the refinement “iterable that’s not an iterator and is reusable”, much less the further refinement “iterable that’s reusable, providing a distinct iterator that starts from the head each time, and allows multiple such iterators in parallel”.
Aside: Does "multiple parallel iterators" add anything to "distinct iterator that starts from the head each time"? Or did you mean what I would express as "and *so* it allows multiple parallel iterators"?
I’m being redundant here to make sure I’m understood, because just saying it the second way apparently didn’t get the idea across the first time.
But that last thing is exactly the behavior you expect from “things like list, dict, etc.”, and it’s hard to explain, and therefore hard to document.
Um, you just did *explain* it, quite well IMHO, you just didn't *name* it. ;-)
Well, it was a long, and redundant, explanation, not something you’d want to see in the docs or even a PEP.
The closest word for that is “collection”, but Collection is also a protocol that adds being a Container and being Sized on top of being Iterable, so it’s misleading unless you’re really careful. So the docs don’t clearly tell people that range, dict_keys, etc. are exactly that “like list, dict, etc.” thing, so people are confused about what they are. People know they’re lazy, they know iterators are lazy,
I'm not sure what "lazy" means here. range is lazy: the index it reports doesn't exist anywhere in the program's data until it computes it. But I wouldn't call a dict view "lazy" any more than I'd call the underlying dict "lazy". Views are references, or alternative access interfaces if you like. But the data for the view already exists.
“lazy” as in it creates something that acts like a list or a set, but hasn’t actually stored a list or set or other data structure in memory or done a bunch of up-front CPU work. You’re right that a more precise definition would probably include range but not dict_keys, but I think people do use it in a way that includes both, and that’s part of the reason they’re equally confused into thinking both are iterators.
so they think they’re a kind of iterator, and the docs don’t ever make it clear why that’s wrong.
I don't think the problem is in the docs. Iterators and views aren't the only things that are lazy, here. People are even lazier! :-) Of course that's somewhat unfair, but in a technical sense quite true: most people don't read the docs until they run into trouble getting the program to behave as they want.
Well, yes, but people writing proposals to change the language or designing PyPI libraries to extend it or writing StackOverflow answers to help other people learn it are getting it wrong, not just people using it day to day. Even when they cite quotes out of the docs, they still often get it wrong. Which makes me think the docs really are part of the problem.
And not having names for things, even if they _are_ well explained somewhere, makes that problem hard to solve. A shorthand description is usually vague and it’s not clear where to go to to get clarification; a name is at least as vague but it’s obvious what to search for to get the exact definition (if there’s not already a link right there).
There are no types in Python’s stdlib that have the behavior you suggested of being an iterator but resetting each time you iterate. (The closest thing is file objects, but you have to manually reset them with seek(0).)
Isn't manual reset exactly what you want from a resettable iterator, though?
Yes. I certainly use seek(0) on files, and it’s a perfectly cromulent concept, it’s just not the concept I’d want on a range or a keys view or a sequence slice.