Andrew Barnert via Python-ideas writes:
Which is why it's not wrong to say that a range object is an iterator, but is IS wrong to say that it's Just and iterator ...
No, they’re not iterators. You’ve got it backward—every iterator is an iterable, but most iterables are not iterators.
An iterator is an iterable that has a __next__ method and returns self from __iter__. List, tuples, dicts, etc. are not iterators, and neither are ranges, or the dict views.
[example snipped]
A lot of people get this confused. I think the problem is that we don’t have a word for “iterable that’s not an iterator”,
I think part of the problem is that people rarely see explicit iterator objects in the wild. Most of the time we encounter iterator objects only implicitly. Nomenclature *is* a problem (I still don't know what a "generator" is: a function that contains "yield" in its def, or the result of invoking such a function), but part of the reason for that is that Python successfully hides objects like iterators and generator objects much of the time (I use generator expressions a lot, "yield" rarely).
or for the refinement “iterable that’s not an iterator and is reusable”, much less the further refinement “iterable that’s reusable, providing a distinct iterator that starts from the head each time, and allows multiple such iterators in parallel”.
Aside: Does "multiple parallel iterators" add anything to "distinct iterator that starts from the head each time"? Or did you mean what I would express as "and *so* it allows multiple parallel iterators"?
But that last thing is exactly the behavior you expect from “things like list, dict, etc.”, and it’s hard to explain, and therefore hard to document.
Um, you just did *explain* it, quite well IMHO, you just didn't *name* it. ;-)
The closest word for that is “collection”, but Collection is also a protocol that adds being a Container and being Sized on top of being Iterable, so it’s misleading unless you’re really careful. So the docs don’t clearly tell people that range, dict_keys, etc. are exactly that “like list, dict, etc.” thing, so people are confused about what they are. People know they’re lazy, they know iterators are lazy,
I'm not sure what "lazy" means here. range is lazy: the index it reports doesn't exist anywhere in the program's data until it computes it. But I wouldn't call a dict view "lazy" any more than I'd call the underlying dict "lazy". Views are references, or alternative access interfaces if you like. But the data for the view already exists.
so they think they’re a kind of iterator, and the docs don’t ever make it clear why that’s wrong.
I don't think the problem is in the docs. Iterators and views aren't the only things that are lazy, here. People are even lazier! :-) Of course that's somewhat unfair, but in a technical sense quite true: most people don't read the docs until they run into trouble getting the program to behave as they want.
There are no types in Python’s stdlib that have the behavior you suggested of being an iterator but resetting each time you iterate. (The closest thing is file objects, but you have to manually reset them with seek(0).)
Isn't manual reset exactly what you want from a resettable iterator, though?
Sometimes you want an iterator on a file to reset (Emacs reads the last block of a Lisp library looking for a local variables block, then rereads the library to load it as Lisp -- you could do this with sequential access), and sometimes you want an interruptible iterator (mail message: read From_, read header, read body), and sometimes you want both (mbox file, you want to "unread" the From_ line of the next message).
I guess there are cases where you want to read a prefix repeatedly (eg, simulation of different models on the same underlying pseudo-random sequence), but I think they're very specialized.
Steve