
Andrew Barnert writes:
The answer is that files are iterators, while lists are… well, there is no word.
As Chris B said, sure there are words: File objects are *already* iterators, while lists are *not*. My question is, "why isn't that instructive?"
Well, it’s not _completely_ not instructive, it’s just not _sufficiently_ instructive.
Language is more useful when the concepts it names carve up the world in the same way you usually think about it.
True. But that doesn't mean we need names for everything. In your "phases of matter" example, there are two characteristics, fluidity (which gases and liquids have, but solids don't) and compressibility (which gases have, but neither solids nor liquids do). Here the tripartite vocabulary makes sense, since they're orthogonal, and (in our modern world) all three concepts are everyday experience.
Yes, it’s true that we can talk about “iterables that are not iterators”. But that doesn’t mean there’s no need for a word.
True, but that also doesn't mean there *is* need for a word.
We don’t technically need the word “liquid” because we could always talk about “compressibles that are not solid” (or “fluids that are not gas”)
True, but neither "compressibles" nor "fluids" "is a thing". Instead, in everyday language "fluid" is pretty much synonymous with "liquid", and AFAIK there are no compressibles that aren't fluids, so "compressible" is pretty much purely an adjective. OTOH, it's useful to pick out each phase of matter separately. You haven't make an argument that it's useful to pick out "iterables that aren't iterators" separately yet, except that you believe that a word would help (which to me is evidence for the need, but not very strong evidence). The reason I'm quite unpersuaded is that there's also a concept of marked vs unmarked in linguistics. Marked concepts are explicitly indicated; unmarked concepts require an explicit contrast with the marked concept, or they get folded into the generic word, leaving some ambiguity that gets resolved by context. (This can get really persnickety with no obvious rules even in the same domain. For example, with gender, "he" is unmarked, and you need to disambiguate "male person" from "person of unknown gender" fromm context, at least in traditional English grammar. While "she" is marked. By contrast, "male" and "female" are both unambiguous.) Now, it seems to me that we are only ever going to discuss iterators in the context of iteration, which means our domain of discourse is pretty much restricted to iterables. (In the sense that there's nothing left to discuss about iteration once you've classed an entity as "not iterable".) Given the way iterable and iterator are defined, it seems perfectly reasonable to me that iterator would be marked, non-iterator iterable left to its own devices, and the word "iterable" disambiguated from context, or perhaps marked with some fairly clumsy modifier. So how can one explain "the problem with re-iterating files"? Here's how I would (now that I've thought more about it than I should ;-): Student: OK, so we use 'for' to iterate over lists etc. And it's cool that we can do "for line in file". But how come if I need to do it twice, with lists I can just use a new 'for' statement, but with files nothing useful happens? Teacher: That's a good question. You know that "things we can use in a for statement" are called "iterables" right? Well, files are a special kind of iterable called "iterator", and you can "start them where you left off" with a new 'for' statement. Student: But the 'for' statement runs out! You don't want to restart in the middle! Teacher: Exactly! And that's why nothing useful happens when you use a second for statement on an already-open file. But you can use 'break' to stop partway through. Student: Huh? What's that good for? Teacher: [Gives relevant example: paragraph-wise processing in text files with empty line paragraph breaks, message-wise processing in mbox files, etc.] Student: Well, OK. But that's not what I expected or wanted. Teacher: [Presses "play" on Rolling Stones tune cued up for this moment. Continues as voice-over.] True enough. I wasn't there when they designed this interface to files, so I'm not sure all the reasons but I do find it useful for the kind of processing I described earlier. Of course, you can get the effect you want by using 'open' again. It's a little annoying that *you* have to remember to do this. Also, there is a way to reset files the way you want. Just use the '.seek(0)' method on the file before the second 'for' statement. Student: Hey, wait! Suppose I wanted to "restart where I left off" in iterating over a list. I guess that just doesn't work? Teacher: [Wishes she had more students like this.] Another good question. If you want to do that, you have to construct an iterator from the list: 'lit = iter(l)'. Now iterate over 'lit', and you can break in the middle and restart with a new 'for' statement, just like with files. It's a little annoying that you have to remember ... Student: [clobbers teacher with a handy copy of Python Essential Reference] The point of the little dialogue is that although the word "iterator" is used, the student only has to remember it until the end of any sentence in which it's used. I think the student's responses are quite natural, and they don't mention "iterator". I suspect this student won't remember 'iter' but I bet she does remember '.seek(0)'. On the other hand, what is there to explain *specifically* about iterables that aren't iterators that explaining about iterables doesn't do just as well? I guess there's the inverse of the "why doesn't it work with files?" question, but does that ever get asked? Surely almost all students encounter iteration over sequences first, and only later over iterators?
2. The *for* statement and the *next* builtin require an iterator object to work. Since for *always* needs an iterator object, it automatically converts the "in" object to an iterator implicitly. (Technical note: for the convenience of implementors of 'for', when iter is applied to an iterator, it always returns the iterator itself.)
I think this is more complicated than people need to know, or usually learn. People use for loops almost from the start, but many people get by with never calling next. All you need is the concept “thing that can be used in a for loop”, which we call “iterable”.
Conceded. "Had I only more time, I would have written a much shorter post."
“Iterable” is the fundamental concept.
We agree on this too.
Of course you will need to learn the concept “iterator” pretty soon anyway, but only because Python actually gives you iterators all over the place. [...] You want to know whether they can be used in for loops
I think now you are over-thinking this. Iterators *are* iterables. You have one because somebody told you it's iterable, and you want to use it in a 'for' loop. You only need to know that it's an iterator if you want to re-iterate from the beginning, rather than re-start from where you left off. "Iterator" is the marked case. But the "marker" is that you find out about it when it doesn't "do what I meant".
I think many people do get this, and that’s exactly what leads to confusion. They think that “lazy” and “iterator” (or “consumed when you loop over it”) go together. But they don’t.
I'll grant that my words admit such confusion, especially if people are predisposed to it. I think they are. After all, none of your "many people" have read my thoughts on the matter before this thread! Just as there are times when LBYL is the appropriate programming technique (even though EAFP is possible), sometimes people who don't read the whole relevant manual section in advance are going to get burned by their guesses and analogies (especially if they got them from others of the same type).
Back to the discussion: the child can touch both, and does so frequently (assuming you don't feed them from the dog's bowl and also bathe them regularly). They've seen glasses break, most likely, and splashed water.
And someone learning Python does get to touch both things here. They get lists, dicts, and ranges, and they get files, zips, and enumerate. Both categories come up pretty early in learning Python, just like both solids and liquids come up pretty early in learning to be human.
No, they don't, in a sense I explained. Until the student has a use case where they need to restart (either where they left off or from the beginning) they can't tell the difference because they just put the whatever in a 'for' statement which works like magic -- and to them it is pure magic, because they don't know what iterable or iterator or __iter__ or iter or __next__ or next are. They just know you can use lists and some other things in a 'for' statement. The restart distinction may not come up for a long time. I didn't really have a use case for it, until one time I wanted to do something with mbox files and I didn't like what the mailbox module does. So I had to roll my own.
No, it’s iterables whose purpose is being fed to a for statement.
I disgree, both in the abstract (Sequences are iterable, but don't necessarily have an __iter__, and so I don't see how you can support your assertion that their purpose is to be fed to 'for') and in the concrete (lots of iterables with __iter__ are instantiated and never intended to be iterated, yet are useful). By contrast, every iterator has an __iter__, and the technical term for an iterator that is never iterated is "garbage".
Yes, iterators are what for statements use under the covers to deal with iterables, but you don’t need to learn that until well after you’ve learned that iterators are what you get from open and zip.
True enough, my bad. I was confounding two documentation problems there. One is teaching new users, and the other is helping experts get it exactly right. I've mixed them up quite a bit, but my list of 5 points should be thought of as aimed at a concise but comprehensive description rather than a tutorial.
You don’t have to call them “file iterators”, you just have to have to word “iterator” lying around to teach them when they ask why they can’t loop over a file twice. Which we do.
Eh, that's my argument. :-)
In the same way, you don’t need to call lists “list iterables”[.]
And there's no way that I would. "Iterable" is an adjective. The usage "iterables" for the class of iterable objects is something of an abuse.[2] My point about files is that they're the thing I would expect would be most folks' first unpleasant encounter with an exhausted iterator object, and by naming them as "file iterators" you might be able to induce a lot of "a ha!" moments. You come around to a related suggestion below. I admit that the "file iterator" suggestion is pretty implausible.
You just need to have the word “iterable” lying around to teach them when they ask what other kinds of things can go in a for loop.
I don't think you meant to write that: when they ask that, you don't say "iterables, of course", you say "tuples, sets, and perhaps surprisingly dicts, as well as dict views, and many other things." It's only when you or the student need a name for that whole class that you bring up the term "iterable" (at least in its noun form). But I don't think that comes up, at least on the student side, for quite a while. A good student might ask "what else is iterable?" but "What else can I use in a 'for' statement?" is perfectly serviceable. I suppose the teacher might find it painful to completely avoid the term "iterable" (especially as an adjective, and "iterator", for that matter), but I would solve that problem as in the dialog: just use them in such a way that the student doesn't need to remember them. I think that's quite do-able, even natural. I do not claim this leaves the student with a complete and satisfactory understanding of the concept of iterator, merely that it allows them to understand the difference between iterables that start from where they left off and those that begin again at the beginning.
And you don’t need to call lists “list collections”, you just need to have the word “collection” lying around to teach them when they ask why ranges and lists and dicts let you loop over their values over and over.
Have you ever been asked that, outside of the context of explaining why files, zips, etc. don't allow re-iteration from the start? Has anyone come to you puzzled because the second loop over a list did useful work?
We have that word and distinction. A file object *is* an iterator. A list is *not* an iterator. *for* works *with* iterators internally, and *on* iterables through the magic of __iter__.
“Not an iterator” is not a word. Of course you _can_ talk about things that don’t have names by being circuitous, but it’s harder.
Or you can not talk about them at all. This is very frustrating, because I agree with everything you say as a general principle, but your concrete discussion never refers to iterators or iterables. It's always an analogy to birds and reptiles and plasmas and liquids. I think that analogy breaks down because I doubt new programmers get confused by the fact that they can re-iterate over lists. Like, not ever. I'd even bet that students who try breaking out, then restarting where they left off, and have it fail by restarting from the beginning, are disappointed but not shocked. So when do you *need* to talk about non-iterator iterables? Outside of threads like this one?
And in practice, people do need to think about “things that can be looped over repeatedly and give you their values over and over”, and having to say “iterables that are not iterators” may be technically sufficient, but practically it makes communication and thought harder.
Or you can just treat "things that can be looped over repeatedly and give you their values over and over" as the unmarked case of "iterable", and speak of "iterators" when you need to distinguish the marked case.[3] Use of "marking" is something we do all the time. I can't say for sure that it would work here, but nothing you've written yet convinces me it wouldn't.
It means we have to be more verbose and less to the point,
It doesn't mean we *have* to be more verbose, in principle. "Marking" works fine in natural language, just as anaphoric "it" does. I may be missing something, but you need to be more concrete about what the need for this word (yet to be named) is.
and people make silly mistakes like the one in the parent thread, and people make more serious mistakes like teaching others that ranges are iterators,
Indeed they do. I don't think that has as much to do with people not having a word for iterables that aren't iterators as it does with them not understanding what an iterator is. Just because you have a word, say "nandaro", for iterables that aren't iterators doesn't mean that otherwise well-informed people will correctly classify ranges as nandaro rather than incorrectly as iterators. As far as I can tell, most of the rest of your post addresses an argument that I'm not making, and I don't know how to do it better, so I'm just going to let it rest there. As mentioned above, this captures a good bit of what I'm trying to get at:
On the other hand, this would certainly get the notion of “files are streams” across to novices (as opposed to people coming from other languages) faster and more easily than we do today, which might help a lot of them. It might even turn out to solve the “why can’t I loop over this file twice” question for a lot of people in a different way, and that different way might be something you could build on to explain the difference between zip and range. “Like a stream” is much more accurate than “because it wants to be lazy”, and maybe easier to understand as well.
Footnotes: [1] Or maybe "marked" doesn't apply here because those words are on equal footing -- I'm not a linguist, I've just heard the concept discussed by real linguists. [2] Linguists have a technical term for this kind of "abuse" but I don't remember it. [3] I recognize that you can create objects that break this dichotomy. I doubt they're important enough to impede discussion for lack of the word for "non-iterator iterables". Again, concrete examples would really help.