On May 13, 2020, at 12:40, Christopher Barker <pythonchb@gmail.com> wrote:

I hope you don’t mind, but I’m going to take your reply out of order to get the most important stuff first, in case anyone else is still reading. :)

Back to the Sequence View idea, I need to write this up properly, but I'm thinking something like:

(using a concrete example or list)

list.view is a read-only property that returns an indexable object.
indexing that object with a slice returns a list_view object

a_view = list.view[a:b:c]

a_view is a list_ view object

a list_view object is a immutable sequence. indexing it returns elements from the original list.

Can we just say that it returns an immutable sequence that blah blah, without defining or naming the type of that sequence?

Python doesn’t define the types of most things you never construct directly. (Sometimes there is a public name for it buried away in the types module, but it’s not mentioned anywhere else.) Even the dict view objects, which need a whole docs section to describe them, never say what type they are.

And I think this is intentional. For example, nowhere does it say what type function.__get__ returns, only what behavior that object has—and that allowed Python 3 to get rid of unbound methods, because a function already has the right behavior. And nobody even notices that list and tuple use the same type for their __iter__ in some Python implementations but not others. Similarly, I think dict.__iter__() used to return a different type from dict.keys().__iter__() in CPython but now they share a type, and that didn’t break any backward compatibility guarantees.

And it seems there’s no reason you couldn’t use the same generic sequence view type on all sequences, but also it’s possible that a custom one for list and tuple might allow some optimization (and even more likely so for range, although it may be less important). So if you don’t specify the type, that can be left up to each version of each implementation to decide.

slicing a list view returns ???? I'm not sure what here -- it should probably be a copy, so a new list_view object refgerenceing the same list? That will need to be thought out carefully)

Good question. I suppose there are three choices: (1) a list (or, in general, whatever the original object returns from slicing), (2) a new view of the same list, or (3) a view of the view of the list.

I think I agree with you here that (2) is the best option. In other words, lst.view[2::2][1::3] gives you the exact same thing as lst.view[4::6].

At first that sounds weird because if you can inspect the attributes of the view object, there’s way to see that you did a [1::3] anywhere.

But that’s exactly the same thing that happens with, e.g,, range(100)[2::2][1::3]. You just get range(4, 100, 6), and there’s no way to see that you did a [1::3] anywhere.

And the same is true for memoryview, and for numpy arrays and bintrees tree slices—despite them being radically different things in lots of other ways, they all made the same choice here. And even beyond Python, it’s what slicing a slice view does in Swift (even though other kinds of views of views don’t “flatten out” like this, slice views of slice views do), and in Go. (Although C++20 is a counterexample here.)

calling.view on a list_view is another trick -- does it reference the host view? or go straight back to the original sequence?

I think it’s the same answer again. In fact, I think .view on any slice view should just return self.

Think about it: whether you decided that lst.view[2::2][1::3] gives lst.view[4::6] or a nested view-of-a-view-of-a-list, it would be confusing if lst.view[2::2].view[1::3] gave you the other one, and what other options would make sense? And, unless there’s some other behavior besides slicing on view properties, if self.view slices the same as self, it might as well just be self.

iter(a_list_view) returns a list_viewiterator.

Here, it seems even more useful to leave the type unspecified. For list (and tuple) in CPython, I’m not sure if you can get away with using the special list_iterator type used by list and tuple (which accesses the underlying array directly), or, if not that, the PySeqIter type used for old-style iter-by-indexing, but if you can, it would be both simpler and more efficient. And similarly, range.view might be able to use the range_iterator type. Or, if you can’t do that, a generic PyIter around tp_next would be less efficient than a custom type, but again simpler, and the efficiency might not matter. Or, if you just had a single sequence view type rather than custom ones for each sequence type, that would obviously mean a single iterator type. And so on. That all seems like quality-of-implementation stuff that should be left open to whatever turns out to be best.

iterating that gets you items from the "host" "on the fly.

All this is a fair bit more complicated than my original idea -- which was to not have a full view, but simply an iterator you can get from slice notation. 

But it would also open up a world of possibilities!

Yes, in the same way that range (and 2.x xrange) is more complicated but more useful than a hypothetical irange and 3.x dict.keys() (and 2.7 dict.viewkeys()) is more complicated but more useful than 2.6 dict.iterkeys(). I think it’s worth it, but it is a trade off.

Now onto the stuff that probably nobody else cares about:

It took me a good while to "get" the distinction between an itertor and an iterable, and I still misuse those terms sometimes.

Maybe because iterable is an awkward word (that my spell checked doesn't recognize)?

My spellchecker is happy with Iterable with a capital I (because it’s seen me type so much Python code?) but complains about iterable with a lowercase i. Or just autocorrects it—sometimes to capital-I Iterable, sometimes to utterable. (Which I wouldn’t think is a word that comes up often enough in anyone’s usage to be a common autocorrect target. Maybe unutterable, but even then only if you’re talking about Lovecraftian horror or religious mysticism.)

But it's also because there is a clear definition for "Iterator" in Python, bu the term is used a bit more generally in vague CS nomenclature.

Yes. And in different languages, too. In C++, iterators are an abstraction of pointers; in OCaml they’re an abstraction of HOFs like map; worst of all, Swift built everything around these three concepts they call “sequence”, “iterator”, and “generator”, clearly aimed at getting the best of both worlds from Python and C++, but all of those concepts mean the wrong thing if you’re coming from either language, and then they changed things between 1.0 and 2.0 just in case anyone wasn’t confused yet.

The other confusion is that an iterable is not an iterator, but iterators are, in fact, iterables (i.e. you can all iter() on them).

Yes. Which is essential to a lot of things about Python’s design, but not essential to the concept at an abstract CS level.

I think this is mostly the result of the "for loop" protocol pre-dating the iteration protocol, and wanting to have the same nifty way to iterate everything. That is -- we want to be able to use iterators in for loops, and not have to call iter() in anything before using a for loop. But in fact, I think this is a nice convenience, and mayb one that would be kept in a new language anyway -- it's really handy that you can do A LOT without knowing about iter() and next() and StopIteration, while those tools are stil there when needed.

I’m not sure about that. There are at least two ways to design a language that doesn’t need both concepts, and both have been tried, even if nobody’s been quite successful yet.

The first is the C++ way: just put iterators front and center and make people call iter (or, in their case, begin and end) all over the place. This is pretty easy to understand, and it has some nice advantages (like being able to loop over C strings and arrays without wrapping them). It’s just not actually usable in everyday code unless you start layering a bunch of stuff on top of it, at which point you’ve only avoided the concept of “iterable” by making people learn the concept of “implicitly convertible to iterator range” instead.

The second is the Swift way (I’m going to use Python terms rather than Swift ones here to avoid confusion): hide iterators as much as possible. (Java and C# are also gradually moving in this direction, but have a lot more legacy weighing them down.) In Swift, you can’t loop over iterators, or pass them to functions like map—and that’s fine, because functions like map don’t return iterators, they return views. The only place you ever see an iterator in the wild is inside the implementation of a handful of functions like map and zip that really do need to munge iterators manually, and many people will never even read, much less write, such a function. If you do happen to get an iterator somehow and want to use it as an iterable, you have to wrap it in a trivial view object that delegates to it, but this almost never comes up. Sadly, this makes it so much harder to write your stdlib that Apple took three tries (after going public) before they got it right.

Some day, someone probably will design a language that doesn’t require most people to learn both concepts and is actually usable. Until then, I’m happy we’ve got Python. :)

Bringing this back to the original topic:

I suppose we *could* have a "file_view" object that acted like the list you get from readlines(), but actually called seek() on the underlying file to give you the lines lazily one at a time. That would be, shall we say, problematic, performance wise, but it could be done.

I remember learning that the way to do this was the nifty new linecache module. Nobody seems to teach that anymore in the 3.x days, but it’s still there, and works as expected for Unicode text and everything.

But for something more general, you probably wouldn’t want to bother with a special file view. You can very easily write a generic view that takes _any_ iterator and looks like a sequence, pulling and caching the elements on demand. At a certain point, a lot of people think they want this, then you show them how easy it is to build that, and they think it’s cool—but they never use it again. Caching indices instead of the actual lines seems like a nice optimization, but you’d need a specific use case where the time cost is worth the space savings, and if nobody even uses the generic version, nobody needs to optimize it, right? :)

And now on to the stuff that maybe you don’t even care about:

On Wed, May 13, 2020 at 10:52 AM Andrew Barnert <abarnert@yahoo.com> wrote:
On May 12, 2020, at 23:29, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
>>>> A lot of people get this confused. I think the problem is that we
>>>> don’t have a word for “iterable that’s not an iterator”,

isn't that simply an "Iterable" -- as above, yes, all iterators are iterables, but when we speak of iterators specifically, we are usually referring to the ones that are not an iterator.

No, we really aren’t. Iterators being iterable is not just a weird quirk that rarely comes up; it’s essential to things you do every day.

The everyday concept behind “iterable” is “something you can use in a for loop”. (You don’t have to get into the technical “something you can call iter on and get an iterator” that often—but when you do, it’s easy to work out that they’re identical concepts anyway.)

The main thing you do with generator expressions, zips, etc. is not call next and check StopIteration, it’s stick them in a for loop (or generator expressions or map or whatever), exactly the same way you use lists and sets and ranges. So if you think of the word “iterable” is a way that doesn’t include generators and zips and so on, you’re just going to confuse yourself.

> It *is* the distinction I'm making with the word "explicit".  I never
> use "next" on an open file.

nor do I, but there was a conversation on this list a while back, with folks saying that they DID do that.

This is your mail agent being a pain again. You’re the one who said that, I quoted you saying it, and now you’re agreeing with yourself. Can we pass a law that anyone who’s worked on any of the major current mail clients is not allowed to work in software anymore? I think that would benefit the world more than any change we can make to Python…

Personally, I actually do next files. For example:

    with open(path) as f:
        next(f) # skip the first line of the 2-line header
        for row in csv.DictReader(f):

Of course I could have used f.readline() just as well, and I’ve seen as many people do the same thing with readline as with next. It just seems a little more unusual to ignore the result of readline than to ignore the result of next, so when writing it, next feels more natural.

> Students often want to know why this doesn’t work:

    with open("file") as f:
        for line in file:
            do_stuff(line)
        for line in file:
            do_other_stuff(line)

… when this works fine:

    with open("file") as f:
        lines = file.readlines()
    for line in lines:
        do_stuff(line)
    for line in lines:
        do_other_stuff(line)

This question (or a variation on it) gets asked by novices every few day’s on StackOverflow; it’s one of the top common duplicates.

The answer is that files are iterators, while lists are… well, there is no word.

yes, there is -- they are "lists" :-) -- but if you want to be more general, they are Sequences.

But that’s the wrong generalization. Because sets also work the same way, and they aren’t Sequences. Nor are dict views, or many of the other kinds of things-that-can-be-iterated-over-and-over-independently.

Plus, this just confuses what Sequences are about. Sequence is a dead simple concept: if seq[0] makes sense, it’s a sequence; if not, it isn’t.

(Sure, there’s other stuff crammed in there, like being reversible and in-testable and index-searchable, but all of that stuff is stuff you can obviously and trivially build on top of indexing, so you don’t need to think about it. And there’s the subtlety that 0 is a perfectly cromulent dict key, which unfortunately you do sometimes need to think about, but most of the time you don’t. For the most part, Sequence means you can index it.)

Or heck, simply say that readlines() reads the whole file at once into a list, and the file object has nothing to do with it anymore. Whereas looping through the lines in a for loop is getting the lines one by one from the file object, so once you've gotten them, all there are no more.

Which doesn't require me talking about iterators or iterables, or iter() or next()

Sure, which is great right up until they ask the same question about why they can’t iterate twice over a map or zip. (Which is another very common novice dup on StackOverflow. It’s especially sad when they made a commendable start at debugging things on their own by writing `for pair in pairs: print(pair)`, which instead of rewarding them just made the problem even worse.)

Or why they _can_ iterate twice over a range, even though a range clearly isn’t building a whole list in advance. (Especially when they read in some blog that range used to return a list but now it doesn’t. Especially if the person writing that blog misused the word “iterator” in the same way you did earlier, which many of them do.)

You can explain it anyway. In fact, you _have_ to give an explanation with analogies and examples and so on, and that would be true even if there were a word for what lists are. But it would be easier to explain if there were such a word, and if you could link that word to something in the glossary, and a chapter in the tutorial.

Still not sure why "Sequence" doesn't work here? Granted, there *are* be some "iterables that aren't iterators" that aren't Sequences (like dict views), but they are Iterable Containers, and I think you can talk about them as "views" well enough.

Again, surely you don’t want to tell people that sets, dicts, dict views, etc. are Sequences.

And if you say, “well, they aren’t Sequences but they are Containers”, that isn’t very helpful—a Container is a thing that supports “in”, which does happen to be true for those types, but it isn’t relevant, so that’s just confusing.

The word “view” _is_ great for things-like-dict-keys. That’s why I started off this thread asking for a view instead of an iterator, which I thought would be immediately clear. Unfortunately, it isn’t, or we wouldn’t even be having this discussion.

Though now that I've written that, maybe we Should have "Iterable" and "Iterator" as ABCs.

We already do. And Iterator is a subclass of Iterable, just as it should be.

We don’t have an ABC for iterables that give you a new iterator over their contents, that doesn’t use up those contents, every time you iterate them. But that’s not surprising given that we don’t have a word for it. ABCs are named based on either a protocol that already had a name (like Sequence or Coroutine or Rational) or a single method (like Reversible and Hashable), not the other way around. (The only exception I can think of is the ones in io, but they just prove the point—nobody talks about BufferedIOBases as a concept like Sequences or Coroutines, and on the very rare occasions where I need to type-check one, I have to go read the docs to see what I’m supposed to check and what it means to do so.)

 But the distinction between iterators and things-like-list-and-so-on comes up earlier, and a lot more often, so a word for that would buy us a lot more.

And "iterable" doesn't work?

No, it doesn’t. You can’t use “iterable” to mean things like lists and sets but not generators and files, because iterators are every bit as iterable.

This would be like saying you can just use “animal” for things like dogs and people but not frogs and birds, or “number” for things like 1/4 and -3/17 but not e and pi, or “Christian” for people like Lutherans and Methodists but not Catholics and Orthodox, etc. We have words for concepts like “mammal” and “rational” and “Protestant”, because you can’t just say “animal” and “number” and “Christian” or you’re being confusing.