It took me a good while to "get" the distinction between an itertor and an iterable, and I still misuse those terms sometimes.
Maybe because iterable is an awkward word (that my spell checked doesn't recognize)?
But it's also because there is a clear definition for "Iterator" in Python, bu the term is used a bit more generally in vague CS nomenclature.
The other confusion is that an iterable is not an iterator, but iterators are, in fact, iterables (i.e. you can all iter() on them).
I think this is mostly the result of the "for loop" protocol pre-dating the iteration protocol, and wanting to have the same nifty way to iterate everything. That is -- we want to be able to use iterators in for loops, and not have to call iter() in anything before using a for loop. But in fact, I think this is a nice convenience, and mayb one that would be kept in a new language anyway -- it's really handy that you can do A LOT without knowing about iter() and next() and StopIteration, while those tools are stil there when needed.
OK -- THAT was a digression.
On May 12, 2020, at 23:29, Stephen J. Turnbull <firstname.lastname@example.org> wrote:
>>>> A lot of people get this confused. I think the problem is that we
>>>> don’t have a word for “iterable that’s not an iterator”,
isn't that simply an "Iterable" -- as above, yes, all iterators are iterables, but when we speak of iterators specifically, we are usually referring to the ones that are not an iterator.
> It *is* the distinction I'm making with the word "explicit". I never
> use "next" on an open file.
nor do I, but there was a conversation on this list a while back, with folks saying that they DID do that. The fact we don't may be because file objects have methods that predate them being iterators. I still don't really think of them as being primarily iterators (and they really aren't for binary files), but objects that happen to have the iteration protocol tacked on for convenience. So I use for loops when it's appropriate, and readline() and the like when it's not.
> Students often want to know why this doesn’t work:
with open("file") as f:
for line in file:
for line in file:
… when this works fine:
with open("file") as f:
lines = file.readlines()
for line in lines:
for line in lines:
This question (or a variation on it) gets asked by novices every few day’s on StackOverflow; it’s one of the top common duplicates.
The answer is that files are iterators, while lists are… well, there is no word.
yes, there is -- they are "lists" :-) -- but if you want to be more general, they are Sequences. And that's actually an important distinction to make i n this case -- the fact that calling readlines() reads the entire file into a list all at once is maybe more important than the fact that is doesn't get "exhausted" by looping through it.
My way to teach that is to say that:
for line in a_file():
is analogous to:
if not line:
for line in a_file.readlines():
Or heck, simply say that readlines() reads the whole file at once into a list, and the file object has nothing to do with it anymore. Whereas looping through the lines in a for loop is getting the lines one by one from the file object, so once you've gotten them, all there are no more.
Which doesn't require me talking about iterators or iterables, or iter() or next()
There is a place to get into all that, but I don't think it needs to be that early in the game. And I've never had a problem with this in my intro classes.
Bringing this back to the original topic:
I suppose we *could* have a "file_view" object that acted like the list you get from readlines(), but actually called seek() on the underlying file to give you the lines lazily one at a time. That would be, shall we say, problematic, performance wise, but it could be done.
You can explain it anyway. In fact, you _have_ to give an explanation with analogies and examples and so on, and that would be true even if there were a word for what lists are. But it would be easier to explain if there were such a word, and if you could link that word to something in the glossary, and a chapter in the tutorial.
Still not sure why "Sequence" doesn't work here? Granted, there *are* be some "iterables that aren't iterators" that aren't Sequences (like dict views), but they are Iterable Containers, and I think you can talk about them as "views" well enough.
Though now that I've written that, maybe we Should have "Iterable" and "Iterator" as ABCs.
> I agree there's "amortized zero" cost to the crowd who
> would use those names fairly frequently in design discussions, but
> there is a cost to the "lazy in the technical sense" programmer, who
> might want to read the documentation if it gave "simple answers to
> simple questions",
Sure, but we can still use the Simple answers, like "Sequence" as above in most cases.
> We shouldn’t define everything up front, just the most important things. But this is one of the most important things. People need to understand this distinction very early on to use Python, and many of them don’t get it, hence all the StackOverflow duplicated. People run into this problem well before they run into a problem that requires them to understand the distinction between arguments and parameters, or protocols and ABCs, or Mapping and dict.
That does not match with my experience at all. Yes, maybe the file as iterator example, but that can be explained without getting into the iteration protocol. I've taught intro to Python many years, and never felt the need to clearly define the iteration protocol early in the class.
And I have scientist-programmers on my team that are very productive after years that probably don't get it even now.
But the distinction between iterators and things-like-list-and-so-on comes up earlier, and a lot more often, so a word for that would buy us a lot more.
And "iterable" doesn't work?
The one difference between files and generators is that you can actually reset the file object by calling seek(0). But that doesn’t make file not an iterator. It just makes file an iterator with an extra feature that most iterators don’t have.
indeed -- and "resetting" is simply not part of the iterator protocol.
And I’m pretty sure that’s exactly the confusion that led you to think that dict_keys have weird behavior, and to suggest the same weird behavior for sequence views
I'm not sure who "you" is in this sentence, but I think it may be nobody.
I think *I* started this "resettable iterator" because I did some iPython experimentation on dict_keys() at 2:00am, and made a stupid mistake, which led me to believe that dict_keys had this weird resetable property. But that was a mistake on my part, and I can't even replicate what I did to give myself that idea.
Back to the Sequence View idea, I need to write this up properly, but I'm thinking something like:
(using a concrete example or list)
list.view is a read-only property that returns an indexable object.
indexing that object with a slice returns a list_view object
a_view = list.view[a:b:c]
a_view is a list_ view object
a list_view object is a immutable sequence. indexing it returns elements from the original list.
slicing a list view returns ???? I'm not sure what here -- it should probably be a copy, so a new list_view object refgerenceing the same list? That will need to be thought out carefully)
calling.view on a list_view is another trick -- does it reference the host view? or go straight back to the original sequence?
iter(a_list_view) returns a list_viewiterator.
iterating that gets you items from the "host" "on the fly.
All this is a fair bit more complicated than my original idea -- which was to not have a full view, but simply an iterator you can get from slice notation.
But it would also open up a world of possibilities!
Christopher Barker, PhD
Python Language Consulting
- Scientific Software Development
- Desktop GUI and Web Development
- wxPython, numpy, scipy, Cython