
On May 12, 2020, at 23:29, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Andrew Barnert writes:
On May 10, 2020, at 22:36, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Andrew Barnert via Python-ideas writes:
A lot of people get this confused. I think the problem is that we don’t have a word for “iterable that’s not an iterator”,
I think part of the problem is that people rarely see explicit iterator objects in the wild. Most of the time we encounter iterator objects only implicitly.
We encounter iterators in the wild all the time, we just don’t usually _care_ that they’re iterators instead of “some kind of iterable”, and I think that’s the key distinction you’re looking for.
It *is* the distinction I'm making with the word "explicit". I never use "next" on an open file. I'm not sure your more precise statement is better.
I think the real difference is that I'm thinking of "people" as including my students who have no clue what an iterator does and don't care what an iterable is, they just cargo cult
with open("file") as f: for line in f: do_stuff(line)
while as you point out (and I think is appropriate in this discussion) some people who are discussing proposed changes are using the available terminology incorrectly, and that's not good.
Students often want to know why this doesn’t work: with open("file") as f: for line in file: do_stuff(line) for line in file: do_other_stuff(line) … when this works fine: with open("file") as f: lines = file.readlines() for line in lines: do_stuff(line) for line in lines: do_other_stuff(line) This question (or a variation on it) gets asked by novices every few day’s on StackOverflow; it’s one of the top common duplicates. The answer is that files are iterators, while lists are… well, there is no word. You can explain it anyway. In fact, you _have_ to give an explanation with analogies and examples and so on, and that would be true even if there were a word for what lists are. But it would be easier to explain if there were such a word, and if you could link that word to something in the glossary, and a chapter in the tutorial.
Still, having clear names with simple definitions would help that problem without watering down the benefits.
I disagree. I agree there's "amortized zero" cost to the crowd who would use those names fairly frequently in design discussions, but there is a cost to the "lazy in the technical sense" programmer, who might want to read the documentation if it gave "simple answers to simple questions", but not if they have to wade through a thicket of "twisty subtle definitions all alike" to get to the simple answer, and especially not if it's not obvious after all that what the answer is.
We shouldn’t define everything up front, just the most important things. But this is one of the most important things. People need to understand this distinction very early on to use Python, and many of them don’t get it, hence all the StackOverflow duplicated. People run into this problem well before they run into a problem that requires them to understand the distinction between arguments and parameters, or protocols and ABCs, or Mapping and dict.
It also makes conversations with experts fraught, as those experts will tend to provide more detail and precision than the questioner wants (speaking for myself, anyway!) "Not every one-sentence explanation needs terminology in the documentation."
I think it’s the opposite. I can teach a child why a glass will break permanently when you hit it while a lake won’t by using the words “solid” and “liquid”. I don’t have to give them the scientific definitions and all the equations. I might not even know them. And in the same way, I can teach novices why the x after x=y+1 doesn’t change when y changes by teaching them about variables without having to explain __getattr__ and fast locals and the import system and so on. Knowing all the subtleties or shear force or __getattribute__ or whatever doesn’t prevent me from teaching a kid without getting into those subtleties. The better I understand “solid” or “variable”, the easier it is for me to teach it. That’s how words work, or how the human mind works, or whatever, and that’s why language is useful for teaching.
But that last thing is exactly the behavior you expect from “things like list, dict, etc.”, and it’s hard to explain, and therefore hard to document.
Um, you just did *explain* it, quite well IMHO, you just didn't *name* it. ;-)
Well, it was a long, and redundant, explanation, not something you’d want to see in the docs or even a PEP.
The part I was referring to was the three or so lines preceding in which you defined the behavior desired for views etc. I guess to define terminology for all the variations that might be relevant would be long (and possibly unavoidably redundant).
Yes, and defining terminology for the one distinction that almost always is relevant helps distinguish that distinction from the other ones that rarely come up. Most people (especially novices) don’t often need to think about the distinction between iterables that are sized and also containers vs. those that are not both sized and containers, so the word for that doesn’t buy us much. But the distinction between iterators and things-like-list-and-so-on comes up earlier, and a lot more often, so a word for that would buy us a lot more.
Isn't manual reset exactly what you want from a resettable iterator, though?
Yes. I certainly use seek(0) on files, and it’s a perfectly cromulent concept, it’s just not the concept I’d want on a range or a keys view or a sequence slice.
But you *don't* use seek(0) on files (which are not iterators, and in fact don't actually exist inside of Python, only names for them do). You use them on opened *file objects* which are iterators.
A file object is a file, in the same way that a list object is a list and an int object is an int. Sure, those are all abstractions, and some are quite vague, and occasionally it’s worth talking specifically about Python’s implementation of the abstraction. An int doesn’t have a storage cost; an int object does. A file doesn’t have a fileno, a file object does. But so what? The fact that we use “file” ambiguously for a bunch of related but contradictory abstractions (a stream that you can read or write, a directory entry, the thing an inode points to, a document that an app is working on, …) makes it a bit more confusing, but unfortunately that ambiguity is forced on people before they even get to their first attempt at programming, so it’s probably too late for Python to help (or hurt).
When you open a file again, by default you get a new iterator which begins at the beginning, as you want for those others. My point is that none of the other types you mention are iterators.
I don’t get what you’re driving at here. Lists, sets, ranges, dict_keys, etc. are not iterators. You can write `for x in xs:` over and over and get the values over and over. Because each time, you get a new iterator over their values. Files, maps, zips, generators, etc. are not like that. They’re iterators. If you write `for x in xs:` twice, you get nothing the second time, because each time you’re using the same iterator, and you’ve already used it up. Because iter(xs) is xs when it’s a file or generator etc.
The difference with files is just that they happen to exist in Python as iterables. But after
_What_ exists in Python as iterables? The only representation of files in Python is file objects—the thing you get back from open (or socket.makefile or io.StringIO or whatever else)—and those are iterators.
r = range(n) ri = iter(range) for i in ri: if i > n_2: break
you want the next "for j in ri:" to start where you left off, no?
Yes. That’s why you called iter, after all. Because doing `for i in r:` twice would _not_ start where you left off. Because a range is not an iterator. But file isn’t like that—you don’t have to call iter on it to get an iterator; in fact, if you write fi=iter(f), fi is the same object as f. Because a file is an iterator. Of course you can also get a new range with r=range(n) again, but you don’t have to, because one range(n) is as good as another. But one range_iter is not as good as another, because there’s no way to use one without using it up. And files aren’t like ranges, they’re like range_iters. Compare these: xs = [x*2 for x in range(10)] ys = (y*2 for y in range(10)) Of course you can sort of iterate over ys twice by just running the same generator expression again to get a brand new object, but that’s not the same thing as iterating over xs twice. That’s not “resetting the iterator”, it’s creating a brand new one. In the same way, you can sort of iterate over a file twice just by running the expression that created it twice, but that’s not resetting the file object, it’s creating a new one. The one difference between files and generators is that you can actually reset the file object by calling seek(0). But that doesn’t make file not an iterator. It just makes file an iterator with an extra feature that most iterators don’t have. If “resettable iterator” means anything useful, it means something like file. Claiming that dict_keys is a “resettable iterator” because you can iterate over it twice is massively confusing, because it’s not an iterator at all, it’s the exact same kind of thing as a list or a range. And I’m pretty sure that’s exactly the confusion that led you to think that dict_keys have weird behavior, and to suggest the same weird behavior for sequence views. Like thinking you can’t have two different iterators over the dict_keys that point to different positions—if it were an iterator, that would be true (notice that it’s true of files—if you call iter on a file twice, they will always have the same position, because they’re both actually the same object as file itself), but because dict_keys is not an iterator, it’s not true.