
Here's an idea I've had. How about instead of this: itertools.islice(iterable, 7, 20) We'll just have: itertools.islice(iterable)[7:20] Advantages: 1. More familiar slicing syntax. 2. No need to awkwardly use None when you're interested in just specifying the end of the slice without specifying the start, i.e. islic(x)[:10] instead of islice(x, None, 10) 3. Doesn't require breaking backwards compatibility. What do you think?

On Sat, May 9, 2020 at 11:15 AM Ram Rachum <ram@rachum.com> wrote:
Here's an idea I've had. How about instead of this:
itertools.islice(iterable, 7, 20)
We'll just have:
itertools.islice(iterable)[7:20]
Advantages: 1. More familiar slicing syntax. 2. No need to awkwardly use None when you're interested in just specifying the end of the slice without specifying the start, i.e. islic(x)[:10] instead of islice(x, None, 10) 3. Doesn't require breaking backwards compatibility.
What do you think?
Looking at this, my train of thought was: While we're at it, why not allow slicing generators? And if we do that, what about regular indexing? But then, what if I do `gen[3]` followed by `gen[1]`? Is it an error? Does the generator have to store its past values? Or is `gen[1]` the second item after `gen[3]`? Or wherever the generator last stopped? Well that's probably why I can't index or slice generators - so that code doesn't accidentally make a mess trying to treat a transient iterator the way it does a concrete sequence. A generator says "you can only iterate over me, don't try anything else". Which leads us back to your proposal. `islice(iterable)[7:20]` looks nice, but it also allows `foo(islice(iterable))` where `foo` can do its own indexing and that's leading to dangerous territory.

On Sat, May 9, 2020 at 8:00 PM Alex Hall <alex.mojaki@gmail.com> wrote:
On Sat, May 9, 2020 at 11:15 AM Ram Rachum <ram@rachum.com> wrote:
Here's an idea I've had. How about instead of this:
itertools.islice(iterable, 7, 20)
We'll just have:
itertools.islice(iterable)[7:20]
Advantages: 1. More familiar slicing syntax. 2. No need to awkwardly use None when you're interested in just specifying the end of the slice without specifying the start, i.e. islic(x)[:10] instead of islice(x, None, 10) 3. Doesn't require breaking backwards compatibility.
What do you think?
Looking at this, my train of thought was:
While we're at it, why not allow slicing generators?
Bear in mind that islice takes any iterable, not just a generator. I don't think there's a lot of benefit in adding a bunch of methods to generator objects - aside from iteration, the only functionality they have is coroutine-based. There's no point implementing half of itertools on generators, while still needing to keep itertools itself for all other iterables.
And if we do that, what about regular indexing? But then, what if I do `gen[3]` followed by `gen[1]`? Is it an error? Does the generator have to store its past values? Or is `gen[1]` the second item after `gen[3]`? Or wherever the generator last stopped?
It makes no sense to subscript a generator like that.
Well that's probably why I can't index or slice generators - so that code doesn't accidentally make a mess trying to treat a transient iterator the way it does a concrete sequence. A generator says "you can only iterate over me, don't try anything else".
Which leads us back to your proposal. `islice(iterable)[7:20]` looks nice, but it also allows `foo(islice(iterable))` where `foo` can do its own indexing and that's leading to dangerous territory.
If foo can do its own indexing, it needs to either specify that it takes a Sequence, not just an Iterable, or alternatively it needs to coalesce its argument into a list immediately. If it's documented as taking any iterable, it has to just iterate over it, without subscripting. ChrisA

Funny you should bring this up. I've been meaning, for literally years, to propose not quite this, but adding a "slice iterator" to the sequence protocol. (though note that one alternative is adding slice syntax to itertools.islice) I even got so far as to write a draft PEP and prototype. NOTE: I'm not saying this is ready for a PEP, but it was helpful to use the format to collect my thoughts. https://github.com/PythonCHB/islice-pep/blob/master/pep-xxx-islice.rst And the prototype implementation: https://github.com/PythonCHB/islice-pep/blob/master/islice.py I never got around to posting here, as I wasn't quite finished, and was waiting 'till I had time to deal with the discussion. But since it was brought up -- here we go! If folks have an interest in this, I'd love to get feedback. -CHB On Sat, May 9, 2020 at 3:51 AM Chris Angelico <rosuav@gmail.com> wrote:
On Sat, May 9, 2020 at 8:00 PM Alex Hall <alex.mojaki@gmail.com> wrote:
On Sat, May 9, 2020 at 11:15 AM Ram Rachum <ram@rachum.com> wrote:
Here's an idea I've had. How about instead of this:
itertools.islice(iterable, 7, 20)
We'll just have:
itertools.islice(iterable)[7:20]
Advantages: 1. More familiar slicing syntax. 2. No need to awkwardly use None when you're interested in just
specifying the end of the slice without specifying the start, i.e. islic(x)[:10] instead of islice(x, None, 10)
3. Doesn't require breaking backwards compatibility.
What do you think?
Looking at this, my train of thought was:
While we're at it, why not allow slicing generators?
Bear in mind that islice takes any iterable, not just a generator. I don't think there's a lot of benefit in adding a bunch of methods to generator objects - aside from iteration, the only functionality they have is coroutine-based. There's no point implementing half of itertools on generators, while still needing to keep itertools itself for all other iterables.
And if we do that, what about regular indexing? But then, what if I do `gen[3]` followed by `gen[1]`? Is it an error? Does the generator have to store its past values? Or is `gen[1]` the second item after `gen[3]`? Or wherever the generator last stopped?
It makes no sense to subscript a generator like that.
Well that's probably why I can't index or slice generators - so that code doesn't accidentally make a mess trying to treat a transient iterator the way it does a concrete sequence. A generator says "you can only iterate over me, don't try anything else".
Which leads us back to your proposal. `islice(iterable)[7:20]` looks nice, but it also allows `foo(islice(iterable))` where `foo` can do its own indexing and that's leading to dangerous territory.
If foo can do its own indexing, it needs to either specify that it takes a Sequence, not just an Iterable, or alternatively it needs to coalesce its argument into a list immediately. If it's documented as taking any iterable, it has to just iterate over it, without subscripting.
ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/WADS4D... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On May 9, 2020, at 12:38, Christopher Barker <pythonchb@gmail.com> wrote:
https://github.com/PythonCHB/islice-pep/blob/master/pep-xxx-islice.rst
I haven’t read the whole thing yet, but one thing immediately jumped out at me:
and methods on containers, such as dict.keys return iterators in Python 3,
No they don’t. They return views—objects that are collections in their own right (in particular, they’re not one-shot; they can be iterated over and over) but just delegate to another object rather than storing the data. People also commonly say that range is an iterator instead of a function that returns a list in Python 3, and that’s wrong for the same reason. And this is important here, because a view is what you ideally _want_. The reason range, key view, etc. are views rather than iterators isn’t that it’s easier to implement or explain or anything, it’s that it’s a little harder to implement and explain but so much more useful that it’s worth it. It’s something people take advantage of all the time in real code. And this is pretty easy to implement. I have a quick and dirty version at https://github.com/abarnert/slices, but I think I may have a better version somewhere with more unit tests. For prior art specifically on slicing as a view, rather than just views in general, see memoryview (which only works on buffers, not all sequences) and NumPy (which is weird in many ways, but people rely on slicing giving you a storage-sharing view) The reason I never proposed this for the stdlib (even though that would allow adding methods directly onto the builtin container types, as your proposal does) is that I always want to build a _complete_ view library, with replacements for map, zip, enumerate, all of itertools, etc., and with enough cleverness to present exactly as much functionality as is possible. But just replacing islice is a much simpler task (mainly because the input has to be a sequence and the output is always a sequence, so the only complexity that arises is whether you want to allow mutable views into mutable sequences), and it may well be useful on its own.

On Sat, May 9, 2020 at 1:03 PM Andrew Barnert <abarnert@yahoo.com> wrote:
https://github.com/PythonCHB/islice-pep/blob/master/pep-xxx-islice.rst <https://github.com/PythonCHB/islice-pep/blob/master/pep-xxx-islice.rst>
I haven’t read the whole thing yet, but one thing immediately jumped out at me:
and methods on containers, such as dict.keys return iterators in Python 3,
People also commonly say that range is an iterator instead of a function
No they don’t. They return views—objects that are collections in their own right (in particular, they’re not one-shot; they can be iterated over and over) but just delegate to another object rather than storing the data. Thanks -- that's that kind of thing that led me to say that this is probably not ready for a PEP. but I don't think that invalidates the idea at all -- there is debate about what an "islice" should return, but an iterable view would be a good option. I'm inclined to think that it would be a bad idea to have it return a full sequence view object, and not sure it should do anything other than be iterable. that returns a list in Python 3, Sure, but I don't say that :-) -- a range object is actually s pretty full immutable sequence -- which is pretty handy. But when people say that, they are often being careless, rather than wrong. At least I 'd like to claim that about my saying dict.keys() return an iterator ;-) -- the point of that part of the document is that many things in Py3 do NOT return full realized copies, like py2 did.
And this is important here, because a view is what you ideally _want_. The reason range, key view, etc. are views rather than iterators isn’t that it’s easier to implement or explain or anything, it’s that it’s a little harder to implement and explain but so much more useful that it’s worth it. It’s something people take advantage of all the time in real code.
Maybe -- but "all the time?" I'd vernture to say that absolutiely the most comon thing done with, e.g. dict.keys() is to iterate over it. But yes, having it be a view with other features is handy.
And this is pretty easy to implement. I have a quick and dirty version at https://github.com/abarnert/slices, but I think I may have a better version somewhere with more unit tests.
Thanksl -- I'll take a look.
For prior art specifically on slicing as a view, rather than just views in general, see memoryview (which only works on buffers, not all sequences) and NumPy (which is weird in many ways, but people rely on slicing giving you a storage-sharing view)
The reason I never proposed this for the stdlib (even though that would allow adding methods directly onto the builtin container types, as your
I am a long-time numpy user, and yes, I very much take advantage of the memory sharing view. But I do not think that that would be a good idea for the standard libary. numpy slices return a full-fledged numpy array, which shares a data view with the it's "host" -- this is really helpful for performance reasons -- moving large blocks of data around is expensive, but it's also pretty confusing. And it would be a lot more problematic with, e.g. lists, as the underlying buffer can be reallocated -- numpy arrays are mutable, but not re-sizable, once you've made one its data buffer does not change. proposal does) is that I always want to build a _complete_ view library, with replacements for map, zip, enumerate, all of itertools, etc., and with enough cleverness to present exactly as much functionality as is possible. And I have my doubts about it anyway :-)
But just replacing islice is a much simpler task (mainly because the input has to be a sequence and the output is always a sequence, so the only complexity that arises is whether you want to allow mutable views into mutable sequences), and it may well be useful on its own.
Agreed. And while yes, dict_keys and friends are not JUST iterartors, they also aren't very functional views, either. They are not sequences, certainly not mutabe sequences. And:
(in particular, they’re not one-shot; they can be iterated over and over)
yes, but they are only a single iterator -- if you call iter() on one you always get the same one back, and it's state is preserved. So yes, you can iterate over more than once, but iter() only resets after it's been exhausted before. In short -- not having thought about it deeply at all, but I'm thinking that making an SliceIterator very similar to dict_keys and friends would make a lot of sense. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On May 9, 2020, at 19:43, Christopher Barker <pythonchb@gmail.com> wrote:
On Sat, May 9, 2020 at 1:03 PM Andrew Barnert <abarnert@yahoo.com> wrote:
https://github.com/PythonCHB/islice-pep/blob/master/pep-xxx-islice.rst
I haven’t read the whole thing yet, but one thing immediately jumped out at me:
and methods on containers, such as dict.keys return iterators in Python 3,
No they don’t. They return views—objects that are collections in their own right (in particular, they’re not one-shot; they can be iterated over and over) but just delegate to another object rather than storing the data.
Thanks -- that's that kind of thing that led me to say that this is probably not ready for a PEP.
but I don't think that invalidates the idea at all -- there is debate about what an "islice" should return, but an iterable view would be a good option.
I don’t think it invalidates the basic idea at all, just that it suggests the design should be different. Originally, dict returned lists for keys, values, and items. In 2.2, iterator variants were added. In 3.0, the list and iterator variants were both replaced with view versions, which were enough of an improvement that they were backported to 2.x. Because a view does cover almost all of the uses of both a sequence copy and an iterator. And I think the same is true here.
I'm inclined to think that it would be a bad idea to have it return a full sequence view object, and not sure it should do anything other than be iterable.
Why? What’s the downside to being able to do more with them for the same performance cost and only a little more up-front design work?
And this is important here, because a view is what you ideally _want_. The reason range, key view, etc. are views rather than iterators isn’t that it’s easier to implement or explain or anything, it’s that it’s a little harder to implement and explain but so much more useful that it’s worth it. It’s something people take advantage of all the time in real code.
Maybe -- but "all the time?" I'd vernture to say that absolutiely the most comon thing done with, e.g. dict.keys() is to iterate over it.
Really? When I just want to iterate over a dict’s keys, I iterate the dict itself.
For prior art specifically on slicing as a view, rather than just views in general, see memoryview (which only works on buffers, not all sequences) and NumPy (which is weird in many ways, but people rely on slicing giving you a storage-sharing view)
I am a long-time numpy user, and yes, I very much take advantage of the memory sharing view.
But I do not think that that would be a good idea for the standard libary. numpy slices return a full-fledged numpy array, which shares a data view with the it's "host" -- this is really helpful for performance reasons -- moving large blocks of data around is expensive, but it's also pretty confusing. And it would be a lot more problematic with, e.g. lists, as the underlying buffer can be reallocated -- numpy arrays are mutable, but not re-sizable, once you've made one its data buffer does not change.
That’s no more of a problem for a list slice view than for any of the existing views. The simplest way to implement a view is to keep a reference to the underlying object and delegate to it, which is effectively what the dict views do. (Well, did from 2.x to 3.5. The dict improvements in 3.6 opened up an optimization opportunity, because in the split layout a dict is effectively a wrapper around a keys view and a separate table, so the keys view can refer directly to that thing that already exists. But that isn’t relevant here.) (You _could_ instead refuse to allow expanding a sequence when there’s a live view, as bytearray does with memoryview, but I don’t think that’s necessary here. It’s only needed there a consequence of the fact that the buffer protocol is provided in C rather than in Python. For a slice view, it would just make things more complicated and less functional for no good reason.)
But just replacing islice is a much simpler task (mainly because the input has to be a sequence and the output is always a sequence, so the only complexity that arises is whether you want to allow mutable views into mutable sequences), and it may well be useful on its own.
Agreed. And while yes, dict_keys and friends are not JUST iterartors, they also aren't very functional views, either. They are not sequences,
That’s not true. They are very functional—as functional as reasonably makes sense. The only reason they’re not Sequences is that they’re views on dicts, so indexing makes little sense, but set operations do—and they are in fact Sets. (Except for values.)
certainly not mutabe sequences.
Well, yes, but mutating a dict through its views wouldn’t make sense in the first place: >>> d = {1: 2} >>> k = dict.keys() >>> k |= 3 You’ve told it to add an item with key 3 without telling it what the value is, and there’s no reasonable thing that could mean. A slice view would have no such problem, so mutation is sensible. That being said, mutation could easily be added later without breaking anything, and it does raise some nontrivial design issues (most obviously, notice that my implementation only allows non-size-changing mutations, because otherwise you have to decide whether it remains a view over seq[3:5] or becomes a view over seq[3:6]; all three options seem reasonable there, so I just went with the simplest, and have no good argument for why it’s the best…). So I think it might be better to leave mutation out of the original version anyway unless someone has a need to it (at which point we can use the examples to think through the best answers to the design issues).
And:
(in particular, they’re not one-shot; they can be iterated over and over)
yes, but they are only a single iterator -- if you call iter() on one you always get the same one back, and it's state is preserved.
No, that’s not true. Each call to iter() returns a completely independent iterator each time, with its own independent state that starts at the head of the view. It works exactly the same way as a set, a tuple, or any other normal collection: >>> d = {1: 2, 3: 4, 5: 6 >>> k = d.keys() >>> i1 = iter(k) >>> next(i1) 1 >>> i2 = iter(k) >>> next(i2) 1 >>> list(i1) [3, 5] >>> next(i2) 3 (This was a bit harder to see, and to explain, before 3.6, because that order was intentionally arbitrary, but it was guaranteed to be consistent until you mutated the dict.) Also notice, while the views’ iterators are just like dict iterators, and list iterators for that matter, in that they can’t handle the dict being resized during iteration, the views themselves have no such trouble: >>> d[7] = 8 >>> next(i1) RuntimeError: dictionary changed size during iteration >>> i3 = iter(k) >>> next(i3) 1 Basically, views are not like iterators at all, except in that they save time and space by being lazy.
So yes, you can iterate over more than once, but iter() only resets after it's been exhausted before.
Such a resettable-iterator thing (which would have some precedent in file objects, I suppose) would actually be harder to Implement, on top of being less powerful and potentially confusing. And the same is true for slices.
In short -- not having thought about it deeply at all, but I'm thinking that making an SliceIterator very similar to dict_keys and friends would make a lot of sense.
Yes, as long that means being a full-featured normal collection (in this case a Sequence rather than a Set), not a resettable iterator.

I don’t think it invalidates the basic idea at all, just that it suggests
On Sat, May 9, 2020 at 9:11 PM Andrew Barnert <abarnert@yahoo.com> wrote: the design should be different. Originally, dict returned lists for keys, values, and items. In 2.2, iterator variants were added. In 3.0, the list and iterator variants were both replaced with view versions, which were enough of an improvement that they were backported to 2.x. Because a view does cover almost all of the uses of both a sequence copy and an iterator. And I think the same is true here. Probably yes. I'm inclined to think that it would be a bad idea to have it return a full sequence view object, and not sure it should do anything other than be iterable. Why? What’s the downside to being able to do more with them for the same performance cost and only a little more up-front design work? I'm not worried about the design work -- writing a PEP is a LOT more work than writing the code for this kind of thing :-) And I'll bet folks smarter than me will want to help out with the code part, if this goes anywhere.
And this is important here, because a view is what you ideally _want_. The reason range, key view, etc. are views rather than iterators isn’t that it’s easier to implement or explain or anything, it’s that it’s a little harder to implement and explain but so much more useful that it’s worth it. It’s something people take advantage of all the time in real code.
Maybe -- but "all the time?" I'd venture to say that absolutely the most common thing done with, e.g. dict.keys() is to iterate over it. Really? When I just want to iterate over a dict’s keys, I iterate the dict itself. True -- I was thinking more of ALL the various "iterables that were concretized lists in py2" -- dict_keys() is actually uniquie in that dict itself provides an iterator of the keys. -- I've seen a lot of code like so: for k in dict.keys(): ... and if k in dict.keys(): .... both of which are completely unnecessary. So actually, I'd say that dict.keys() gets used either less often, or when it's not really needed. But you're right, given that, when dict_keys is used when it should be, it would be for other reasons. I"ll bet it's kind of rare though. And dict_items and dict_values are probably most often as iterables.
That’s no more of a problem for a list slice view than for any of the existing views. The simplest way to implement a view is to keep a reference to the underlying object and delegate to it, which is effectively what the dict views do.
(You _could_ instead refuse to allow expanding a sequence when there’s a
Fair enough. Though you still could get potentially surprising behavior if the original sequence's length is changed. live view, as bytearray does with memoryview, but I don’t think that’s necessary here. It’s only needed there a consequence of the fact that the buffer protocol is provided in C rather than in Python. For a slice view, it would just make things more complicated and less functional for no good reason.) But it would also be, well, weird -- you create a view with a slice if a given length, and then the underlying sequence is changed, and then your view object is, well, totally different, it may not even exist (well, be length-zero, I suppose). And you probably don't want to lock the "host" anyway -- that could be very confusing if the view is kept all be somewhere far from the code trying to change the sequence. This is all a bitless complicated for a the dict views, becasue none of them are providing a mapping interface anyway. The other question is -- should a view of a mutable sequence be mutable (and mutate the underlying sequence)? That's how numpy arrays work, but it does require a certain fitness to keep track of.
But just replacing islice is a much simpler task (mainly because the input has to be a sequence and the output is always a sequence, so the only complexity that arises is whether you want to allow mutable views into mutable sequences), and it may well be useful on its own.
So I think it might be better to leave mutation out of the original version anyway unless someone has a need to it (at which point we can use
Agreed. And while yes, dict_keys and friends are not JUST iterartors, they also aren't very functional views, either. They are not sequences, That’s not true. They are very functional—as functional as reasonably makes sense. The only reason they’re not Sequences is that they’re views on dicts, so indexing makes little sense, but set operations do—and they are in fact Sets. (Except for values.) certainly not mutabe sequences. Well, yes, but mutating a dict through its views wouldn’t make sense in the first place: >>> d = {1: 2} >>> k = dict.keys() >>> k |= 3 not for keys, but it would at least be possible for dict_items, and even potentially for dict_values, though yes, that would be really confusing. the examples to think through the best answers to the design issues). Yeah, I'm heading that way too.
yes, but they are only a single iterator -- if you call iter() on one you always get the same one back, and it's state is preserved.
No, that’s not true. Each call to iter() returns a completely independent iterator each time, with its own independent state that starts at the head of the view. Sorry -- total brain blip on my part -- I tested that out before posted, but had a typo that totally invalidated the test --arrg! I'm still a bit confused about what a dict.* view actually is -- for instance, a dict_keys object pretty much acts like a set, but it isn't a subclass of set, and it has an isdisjoint() method, but not .union or any of the other set methods. But it does have what at a glance looks like pretty complete set of dunders.... Anyway, a Sequence view is simpler, because it could probably simply be an immutable sequence -- not much need for contemplating every bit of the API. I do see a possible objection here though. Making a small view of a large sequence would keep that sequence alive, which could be a memory issue. Which is one reason why sliced don't do that by default. And it could simply be a buyer beware issue. But the more featureful you make a view, the more likely it is that they will get used and passed around and kept alive without the programmer realizing the implications of that. Food for thought. Now I need to think about how to write this all up -- which is why I wasn't sure I was ready to bring this up bu now I have, so more to do! PR's accepted on my draft! https://github.com/PythonCHB/islice-pep/blob/master/islice.py >>> d[7] = 8 >>> next(i1) RuntimeError: dictionary changed size during iteration >>> i3 = iter(k) >>> next(i3) That's probably a feature we'd want to emulate.
Basically, views are not like iterators at all, except in that they save time and space by being lazy.
Well, this is a vocabulary issue -- an "iterable" and "iterator" is anything that follows the protocol, so yes, they very much ARE iterables (and iterators) even though they also have some additional behavior. Which is why it's not wrong to say that a range object is an iterator, but is IS wrong to say that it's Just and iterator ...
Such a resettable-iterator thing (which would have some precedent in file objects, I suppose) would actually be harder to Implement, on top of being less powerful and potentially confusing. And the same is true for slices.
but the dict_keys iterator does seem to do that ... In [48]: dk Out[48]: dict_keys(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']) In [49]: list(dk) Out[49]: ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] In [50]: list(dk) Out[50]: ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] In short -- not having thought about it deeply at all, but I'm thinking that making an SliceIterator very similar to dict_keys and friends would make a lot of sense. Yes, as long that means being a full-featured normal collection (in this case a Sequence rather than a Set), not a resettable iterator. Yup -- I was pretty much only disagreeing due to my ignorance of the dict views --thanks for lesson! -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On May 10, 2020, at 11:09, Christopher Barker <pythonchb@gmail.com> wrote: Is there any way you can fix the reply quoting on your mail client, or manually work around it? I keep reading paragraphs and saying “why is he saying the same thing I said” only to realize that you’re not, that’s just a quote from me that isn’t marked, up until the last line where it isn’t…
On Sat, May 9, 2020 at 9:11 PM Andrew Barnert <abarnert@yahoo.com> wrote:
That’s no more of a problem for a list slice view than for any of the existing views. The simplest way to implement a view is to keep a reference to the underlying object and delegate to it, which is effectively what the dict views do.
Fair enough. Though you still could get potentially surprising behavior if the original sequence's length is changed.
I don’t think it’s surprising. When you go out of your way to ask for a dynamic view instead of the default snapshot copy, and then you change the list, you’d expect the view to change. If you don’t keep views around, because you’re only using them for more efficient one-shot iteration, you might never think about that, but then you’ll never notice it to be surprised by it. The dynamic behavior of dict views presumably hasn’t ever surprised you in the 12 years it’s worked that way.
And you probably don't want to lock the "host" anyway -- that could be very confusing if the view is kept all be somewhere far from the code trying to change the sequence.
Yes. I think memoryview’s locking behavior is a special case, not something we’d want to emulate here. I’m guessing many people just never use memoryview at all, but when you do, you’re generally thinking about raw buffers rather than abstract behavior. (It’s right there in the name…) And when you need something more featureful than an invisible hard lock on the host, it’s time for numpy. :)
I'm still a bit confused about what a dict.* view actually is
The docs explain it reasonably well. See https://docs.python.org/3/glossary.html#term-dictionary-view for the basic idea, https://docs.python.org/3/library/stdtypes.html#dict-views for the details on the concrete types, and I think the relevant ABCs and data model entries are linked from there.
-- for instance, a dict_keys object pretty much acts like a set, but it isn't a subclass of set, and it has an isdisjoint() method, but not .union or any of the other set methods. But it does have what at a glance looks like pretty complete set of dunders....
The point of collections.abc.Set, and ABCs jn general, and the whole concept of protocols, is that the set protocol can be implemented by different concrete types—set, frozenset, dict_keys, third-party types like sortedcontainers.SortedSet or pyobjc.Foundation.NSSet, etc.—that are generally completely unrelated to each other, and implemented in different ways—a dict_keys is a link to the keys table in a dict somewhere, a set or frozenset has its own hash table, a SortedSet has a wide-B-tree-like structure, an NSSet is a proxy to an ObjC object, etc. if they all had to be subclasses of set, they’d be carrying around a set’s hash table but never using it; they’d have to be careful to override every method to make sure it never accidentally got used (and what would frozenset or dict_keys override add with?), etc. And if you look at the ABC, union isn’t part of the protocol, but __or__ is, and so on.
Anyway, a Sequence view is simpler, because it could probably simply be an immutable sequence -- not much need for contemplating every bit of the API.
It’s really the same thing, it’s just the Sequence protocol rather than the Set protocol. If anything, it’s _less_ simple, because for sequences you have to decide whether indexing should work with negative indices, extended slices, etc., which the protocol is silent about. But the answer there is pretty easy—unless there’s a good reason not to support those things, you want to support them. (The only open question is when you’re designing a sequence that you expect to be subclassed, but I don’t think we’re designing for subclassing here.)
I do see a possible objection here though. Making a small view of a large sequence would keep that sequence alive, which could be a memory issue. Which is one reason why sliced don't do that by default.
Yes. When you just want to iterate something once, non-lazily, you don’t care whether it’s a view of a snapshot, but when you want to keep it around, you do care, and you have to decide which one you want. So we certainly can’t change the default; that would be a huge but subtle change that would break all kinds of code. But I don’t think it’s a problem for offering an alternative that people have to explicitly ask for. Also, notice that this is true for all of the existing views, and none of them try to be un-featureful to avoid it.
And it could simply be a buyer beware issue. But the more featureful you make a view, the more likely it is that they will get used and passed around and kept alive without the programmer realizing the implications of that.
I think it is worth mentioning in the docs.
Now I need to think about how to write this all up -- which is why I wasn't sure I was ready to bring this up bu now I have, so more to do!
Feel free to borrow whatever you want (and discard whatever you don’t want) from the slices repo I posted. (It’s MIT-licensed, but I can relicense it to remove the copyright notice if you want.) I think the biggest question is actually the API. Making this a function (or a class that most people think of as a function, like most of itertools) is easy, but as soon as you say it should be a method or property of sequences, that’s trickier. You can add it to all the builtin sequence types, but should other sequences in the stdlib have it? Should Sequence provide it as a mixin? Should it be part of the sequence protocol, and therefore checked by Sequence as an ABC (even though that could be a breaking change)?
PR's accepted on my draft!
https://github.com/PythonCHB/islice-pep/blob/master/islice.py
>>> d[7] = 8 >>> next(i1) RuntimeError: dictionary changed size during iteration >>> i3 = iter(k) >>> next(i3)
That's probably a feature we'd want to emulate.
Basically, views are not like iterators at all, except in that they save time and space by being lazy.
Well, this is a vocabulary issue -- an "iterable" and "iterator" is anything that follows the protocol, so yes, they very much ARE iterables (and iterators) even though they also have some additional behavior.
Which is why it's not wrong to say that a range object is an iterator, but is IS wrong to say that it's Just and iterator ...
No, they’re not iterators. You’ve got it backward—every iterator is an iterable, but most iterables are not iterators. An iterator is an iterable that has a __next__ method and returns self from __iter__. List, tuples, dicts, etc. are not iterators, and neither are ranges, or the dict views. You can test this easily: >>> isinstance(range(10), collections.abc.Iterator) False A lot of people get this confused. I think the problem is that we don’t have a word for “iterable that’s not an iterator”, or for the refinement “iterable that’s not an iterator and is reusable”, much less the further refinement “iterable that’s reusable, providing a distinct iterator that starts from the head each time, and allows multiple such iterators in parallel”. But that last thing is exactly the behavior you expect from “things like list, dict, etc.”, and it’s hard to explain, and therefore hard to document. The closest word for that is “collection”, but Collection is also a protocol that adds being a Container and being Sized on top of being Iterable, so it’s misleading unless you’re really careful. So the docs don’t clearly tell people that range, dict_keys, etc. are exactly that “like list, dict, etc.” thing, so people are confused about what they are. People know they’re lazy, they know iterators are lazy, so they think they’re a kind of iterator, and the docs don’t ever make it clear why that’s wrong.
Such a resettable-iterator thing (which would have some precedent in file objects, I suppose) would actually be harder to Implement, on top of being less powerful and potentially confusing. And the same is true for slices.
but the dict_keys iterator does seem to do that ...
In [48]: dk Out[48]: dict_keys(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
In [49]: list(dk) Out[49]: ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
In [50]: list(dk) Out[50]: ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
You just picked an example where “resettable iterator” and “collection” would do the same thing. Try the same test with list and it also passes, because list is a collection. You can only distinguish the two cases by partially using an iterator and then asking for another one. And if you do that, you will see that, just like list, dict_keys gives you a brand new, completely independent iterator, initialized from the start, every time you call iter() on it. Because, like list, dict_keys is a collection, not an iterator. There are no types in Python’s stdlib that have the behavior you suggested of being an iterator but resetting each time you iterate. (The closest thing is file objects, but you have to manually reset them with seek(0).)

On Sun, May 10, 2020 at 12:48 PM Andrew Barnert <abarnert@yahoo.com> wrote:
Is there any way you can fix the reply quoting on your mail client, or manually work around it?
I'm trying -- sorry I've missed a few. It seems more and more "modern" email clients make "interspersed" posting really hard. But I hate bottom posting maybe even more than top posting :-( (gmail seems to have magically gotten worse in this regard recently)
If you don’t keep views around, because you’re only using them for more efficient one-shot iteration, you might never think about that, but then you’ll never notice it to be surprised by it. The dynamic behavior of dict views presumably hasn’t ever surprised you in the 12 years it’s worked that way.
True -- though I the dict views aren't mappings themselves, and thus *maybe* less useful, but certainly less tempting to use where you might have otherwise used the original dict or a copy. But sure -- if you have to go out of your way to get it, then you should know what the implications are.
And you probably don't want to lock the "host" anyway -- that could be
very confusing if the view is kept all be somewhere far from the code trying to change the sequence.
Yes. I think memoryview’s locking behavior is a special case, not something we’d want to emulate here. I’m guessing many people just never use memoryview at all, but when you do, you’re generally thinking about raw buffers rather than abstract behavior. (It’s right there in the name…) And when you need something more featureful than an invisible hard lock on the host, it’s time for numpy. :)
Yeah, memoryviews are a pretty special case, I don't think they are really intended to be used much in "user code" rather than libraries with pretty special cases. The docs explain it reasonably well. See
https://docs.python.org/3/glossary.html#term-dictionary-view for the basic idea, https://docs.python.org/3/library/stdtypes.html#dict-views for the details on the concrete types, and I think the relevant ABCs and data model entries are linked from there.
I was surprised to see that there are ABCs for the Mapping Views as well -- that does make it clear. The point of collections.abc.Set, and ABCs jn general, and the whole
concept of protocols, is that the set protocol can be implemented by different concrete types—set, frozenset, dict_keys, third-party types like sortedcontainers.SortedSet or pyobjc.Foundation.NSSet, etc.—that are generally completely unrelated to each other, and implemented in different ways—a
That I knew -- what surprised me was that the "standard" set methods aren't part of the ABC. It's also interesting to note (from another part of this thread) that slicing isn't part of the Sequence ABC, or any? "official" protocol? I do see this, though not entirely sure what to make of it: https://docs.python.org/3/c-api/sequence.html?highlight=sequence Anyway, a Sequence view is simpler, because it could probably simply be an immutable sequence -- not much need for contemplating every bit of the API. It’s really the same thing, it’s just the Sequence protocol rather than the Set protocol. Well, dict_keys is a set, and dict_items is *sometimes* a set, and dict_values is not a set (but is a Sized Collection). If anything, it’s _less_ simple, because for sequences you have to decide
whether indexing should work with negative indices, extended slices, etc., which the protocol is silent about. But the answer there is pretty easy—unless there’s a good reason not to support those things, you want to support them.
Agreed -- Protocol or not, the point would be for a sequence_view to be as must like the built in sequences as possible. And as far as my motivation for all this goes -- getting that nifty slicing behavior is the main point!
(The only open question is when you’re designing a sequence that you expect to be subclassed, but I don’t think we’re designing for subclassing here.)
nope. I do see a possible objection here though. Making a small view of a large sequence would keep that sequence alive, which could be a memory issue. Which is one reason why slices don't do that by default.
But I don’t think it’s a problem for offering an alternative that people have to explicitly ask for.
Also, notice that this is true for all of the existing views, and none of
Probably not. them try to be un-featureful to avoid it. But there is no full featured mapping-view that otherwise acts much like a mapping. in theory, there *could* be -- if there was some nice way to specify a subset of a mapping without copying the whole thing -- I can't think of one at the moment. But having a view that really does act like the original Ihtink makes ir more tempting to use more broadly. But again, you will have had to ask for it. And it could simply be a buyer beware issue. But the more featureful you make a view, the more likely it is that they will get used and passed around and kept alive without the programmer realizing the implications of that.
I think it is worth mentioning in the docs.
Absolutely. Feel free to borrow whatever you want (and discard whatever you don’t want)
from the slices repo I posted. (It’s MIT-licensed, but I can relicense it to remove the copyright notice if you want.)
great, thanks!
I think the biggest question is actually the API. Making this a function (or a class that most people think of as a function, like most of itertools) is easy, but as soon as you say it should be a method or property of sequences, that’s trickier. You can add it to all the builtin sequence types, but should other sequences in the stdlib have it? Should Sequence provide it as a mixin? Should it be part of the sequence protocol, and therefore checked by Sequence as an ABC (even though that could be a breaking change)?
People know they’re lazy, they know iterators are lazy, so they think
Here is where I think you (Andrew) and I (Chris B.) differ in our goals. My goal here is to have an easily accessible way to use the slice syntax to get an iterable that does not make a copy. While we're at it, getting a sequence view that can provide an iterator, and all sorts of other nifty features, is great. But making it a callable in itertools (or any other module) wouldn't accomplish that goal. Hmm, but maybe not that bad: for i in itertools.seq_view(a_list)[::2]: ... I still think I prefer this though: for i in a_list.view[::2]: ... So to all those questions: I say "yes" except maybe: "checked by Sequence as an ABC (even though that could be a breaking change)" -- because, well, breaking changes are "Not good". I wonder if there is a way to make something standard, but not quite break things -- hmm. For instance: It seems to be possible to have Sequence provide it as a mixin, but not have it checked by Sequence as an ABC? For instance, I note that the Mapping ABC has .keys, .items, .values, and there are ABCs for MappingViews, but I can't see if it's defined anywhere that the Mapping methods have to produce View objects. Maybe the only way to do this in a non-breaking way is to Add a new ViewableSequence ABC. Is there precedence here? Have any of the (Major) ABCs grown new features since they were introduced? I also can't find mixins that provide features that aren't part of the standard protocols -- but I haven't looked very hard, either. they’re a kind of iterator, and the docs don’t ever make it clear why that’s wrong. Right -- I think a lot of the confusion around the vocabulary is that people think of "iterators" as being "lazy", and the term gets used a lot when laziness is really the key point. (a lot of confusion like that around "generator" as well). Heck, I got it wrong just then, when I was trying to be careful. -Chris -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On May 10, 2020, at 15:39, Christopher Barker <pythonchb@gmail.com> wrote:
On Sun, May 10, 2020 at 12:48 PM Andrew Barnert <abarnert@yahoo.com> wrote:
Is there any way you can fix the reply quoting on your mail client, or manually work around it?
I'm trying -- sorry I've missed a few. It seems more and more "modern" email clients make "interspersed" posting really hard. But I hate bottom posting maybe even more than top posting :-( (gmail seems to have magically gotten worse in this regard recently)
It seems like the one place Google still sees (the remnants of) Yahoo as a competitor is who can screw up mailing lists worse.
It's also interesting to note (from another part of this thread) that slicing isn't part of the Sequence ABC, or any? "official" protocol?
If we still had separate __getitem__ and __getslice__ when ABCs and the idea of being clearer about protocols had come along, I’ll bet __getslice__ would have been made part of the protocol. But I suppose it’s a little too late for me to complain about a change that I think went in even before new-style classes. :)
I do see this, though not entirely sure what to make of it:
https://docs.python.org/3/c-api/sequence.html?highlight=sequence
Yeah, the fact that sequences and mappings have identical methods means that from Python those two protocols are opt-in rather than automatic, while from C you have to be more prepared for errors after checking than with other protocols. Annoying, but not using the same syntax and dunders for indexing and keying would be a lot more annoying.
Also, notice that this is true for all of the existing views, and none of them try to be un-featureful to avoid it.
But there is no full featured mapping-view that otherwise acts much like a mapping.
types.MappingProxyType. In most cases, type(self).__dict__ will get you one of these. But of course this is a view of the whole dict, not a subset.
in theory, there *could* be -- if there was some nice way to specify a subset of a mapping without copying the whole thing -- I can't think of one at the moment.
Not in the stdlib, but for a SortedDict type, key-slicing makes total sense, and many of them do it—although coming up with a nice API is hard enough that they all seem to do it differently. (Obviously d[lo:hi] should be some iterable of the values from the keys lo<=key<hi, and obviously it should also be a subtree dict with just those keys, and it can’t be both, so you need at least one way to spell an alternative, and maybe as many as a dozen…)
I think the biggest question is actually the API. Making this a function (or a class that most people think of as a function, like most of itertools) is easy, but as soon as you say it should be a method or property of sequences, that’s trickier. You can add it to all the builtin sequence types, but should other sequences in the stdlib have it? Should Sequence provide it as a mixin? Should it be part of the sequence protocol, and therefore checked by Sequence as an ABC (even though that could be a breaking change)?
Here is where I think you (Andrew) and I (Chris B.) differ in our goals. My goal here is to have an easily accessible way to use the slice syntax to get an iterable that does not make a copy.
It’s just a small difference in emphasis. I want a way to get a non-copying slice, and I’d really like it to be easily accessible—I‘d grumble if you didn’t make it a member, but I’d still use it.
While we're at it, getting a sequence view that can provide an iterator, and all sorts of other nifty features, is great. But making it a callable in itertools (or any other module) wouldn't accomplish that goal.
Hmm, but maybe not that bad:
for i in itertools.seq_view(a_list)[::2]: ...
I still think I prefer this though:
for i in a_list.view[::2]: ...
Agreed. A property on sequences would be best, a wrapper object that takes slice syntax clearly back in second, and a callable that takes only islice syntax a very distant third. So if the first one is possible, I’m all for it. My slices repo provides the islice API just because it’s easier for slapping together a proof of concept of the slicing part, definitely not because I’d want that added to the stdlib as-is. However, there is one potential problem with the property I hadn’t thought of until just now: I think people will understand that mylist.view[2:] is not mutable, but will they understand that mystr.view[2:] is not a string? I’m pretty sure that isn’t a problem for seqview(mystr)[2:], but I’m not sure about mystr.view[2:].
So to all those questions: I say "yes" except maybe:
"checked by Sequence as an ABC (even though that could be a breaking change)" -- because, well, breaking changes are "Not good".
I wonder if there is a way to make something standard, but not quite break things -- hmm.
For instance: It seems to be possible to have Sequence provide it as a mixin, but not have it checked by Sequence as an ABC?
Actually, now that I think about it, Sequence _never_ checks methods. Most of the ABCs are automatic (structural): any type with the right methods is a subclass. But Sequence and Mapping are opt-in (nominal): only types that inherit from the ABC or register with it are subclasses. There are @abstractmethod checks, but those only test the methods required to use it as a mixin, not all the methods required to meet the protocol (because, when you’re inheriting, you automatically have those other methods so there’s no point checking for them). I think people assume the protocol is everything the mixin provides, whether that’s correct or not, so if you did add a property to the mixin, you’d probably want to put a note in the docs saying that classes that register with Sequence are not required to support that property (but it is nice if you can). That might be (a) all you can do and (b) good enough.
For instance, I note that the Mapping ABC has .keys, .items, .values, and there are ABCs for MappingViews, but I can't see if it's defined anywhere that the Mapping methods have to produce View objects.
It’s not, just like it’s not defined that Sequence.__getitem__ has to accept slices or negative indices. The only place I can think of where any protocol is defined in terms of exactly what behavior a method supports, rather than just having the method, is (I can’t remember where) iterable: a type is iterable if it has __iter__, or it has __getitem__ that works on all contiguous ints starting from 0 up until it raises IndexError. (But the Iterable ABC just checks for __iter__, so if you want it to accept an old-style iterable you have to register it manually.) But notice that if you use the Mapping mixin to define the methods for you, it does make sure you get the right views. Maybe that’s sort of a precedent for what you’re looking to do?
People know they’re lazy, they know iterators are lazy, so they think they’re a kind of iterator, and the docs don’t ever make it clear why that’s wrong.
Right -- I think a lot of the confusion around the vocabulary is that people think of "iterators" as being "lazy", and the term gets used a lot when laziness is really the key point. (a lot of confusion like that around "generator" as well).
People don’t even have to misuse “generator” to mean “any iterator” or “any lazy thing” to be confusing; it’s always confusing except where something in the context makes it clear. The docs can’t even agree with themselves on whether it means a generator function or a generator iterator, not to mention whether normal functions that return a generator iterator are generator functions (and the recent change to @partial makes that even more fun) and whether types that meet the Generator protocol are generator iterators. I think it’s a testament to how clear the concepts behind this stuff are that people manage to learn and internalize and use them despite the terminology. :)

On Sun, May 10, 2020 at 9:36 PM Andrew Barnert <abarnert@yahoo.com> wrote: <lots of stuff I agree with, and don't really have any more to say about > Here is where I think you (Andrew) and I (Chris B.) differ in our goals. My
goal here is to have an easily accessible way to use the slice syntax to get an iterable that does not make a copy.
It’s just a small difference in emphasis. I want a way to get a non-copying slice, and I’d really like it to be easily accessible—I‘d grumble if you didn’t make it a member, but I’d still use it.
Hmm -- I wasn't sure how key the "slice" part was -- there are, of course, other uses for views. But we're on the same page as to preferences.
However, there is one potential problem with the property I hadn’t thought of until just now: I think people will understand that mylist.view[2:] is not mutable, but will they understand that mystr.view[2:] is not a string? I’m pretty sure that isn’t a problem for seqview(mystr)[2:], but I’m not sure about mystr.view[2:].
One more issue around the whole "a string is sequence of strings" thing :-) Of course, it *could* be a string -- not much difference with immutables. Though I suppose if you took a large slice of a large string, you probably don't want the copy. But what *would* you want to do with it. but if you had a view of a slice, and it was a proper view, it might be pretty poky for many string operations, so probably just as well not to have them. But notice that if you use the Mapping mixin to define the methods for you,
it does make sure you get the right views. Maybe that’s sort of a precedent for what you’re looking to do?
yup -- that does sound like a similar idea. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

I have nothing particularly useful to add, only that this is potentially a really fantastic idea with a lot of promise, IMO. It would be nice to have some view objects with a lot of functionality that can be sliced not only for efficiency, but for other purposes. One might be (note that below I am assuming that slicing a view returns another view): nodes = [(0,0), (1,0), (1,1), (1,0)] triangle1 = [view_of_node_idx0, view_of_node_idx1, view_of_node_idx3] triangle2 = [view_of_node_idx1, view_of_node_idx2, view_of_node_idx3] Now if I move the node locations, the triangles reflect the update: nodes[:] = (1,1), (2,1), (2,2), (2,1) Even tried implementing something like a simple sequence view myself once, but got stuck trying to reliably slice slices and couldn't decide what it should mean to return single values from the view (an atomic "slice"? just return the value?), and there are probably all kinds of subtleties way above my knowledge level to consider: from itertools import islice class SeqView: def __init__(self, seq, sl=slice(None)): self.seq = seq self.sl = sl def __repr__(self): return f"{self.__class__.__name__}({self.seq}, {self.sl})" def __str__(self): return f"{self.seq[self.sl]!s}" def __getitem__(self, key): if isinstance(key, slice): return self.__class__(self.seq, <need to calculate a slice of a slice here>) # even if just returning the value, surely this could be much better? return list(islice(self.seq, self.sl.start, self.sl.stop, self.sl.step))[key] --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

On Mon, May 11, 2020 at 1:52 AM Ricky Teachey <ricky@teachey.org> wrote:
I have nothing particularly useful to add, only that this is potentially a really fantastic idea with a lot of promise, IMO.
It would be nice to have some view objects with a lot of functionality that can be sliced not only for efficiency, but for other purposes. One might be (note that below I am assuming that slicing a view returns another view):
nodes = [(0,0), (1,0), (1,1), (1,0)] triangle1 = [view_of_node_idx0, view_of_node_idx1, view_of_node_idx3] triangle2 = [view_of_node_idx1, view_of_node_idx2, view_of_node_idx3]
Now if I move the node locations, the triangles reflect the update:
nodes[:] = (1,1), (2,1), (2,2), (2,1)
After reading my sent message I decided it probably isn't totally clear: What I mean here is, using a view to construct other objects, such that when the viewed object is updated, the objects making use of the views "see" the update (without having to implement callbacks and observer patterns are all of that kind of thing).

On May 10, 2020, at 21:51, Christopher Barker <pythonchb@gmail.com> wrote:
On Sun, May 10, 2020 at 9:36 PM Andrew Barnert <abarnert@yahoo.com> wrote:
However, there is one potential problem with the property I hadn’t thought of until just now: I think people will understand that mylist.view[2:] is not mutable, but will they understand that mystr.view[2:] is not a string? I’m pretty sure that isn’t a problem for seqview(mystr)[2:], but I’m not sure about mystr.view[2:].
One more issue around the whole "a string is sequence of strings" thing :-) Of course, it *could* be a string -- not much difference with immutables. Though I suppose if you took a large slice of a large string, you probably don't want the copy. But what *would* you want to do with it.
That “string is a sequence of strings” issue, plus the “nothing can duck type as a string“ issue. Here’s an example that I can write in, say, Swift or Rust or even C++, but not in Python: I mmap a giant mailbox file, and I can treat that as a string without copying it anywhere. I split it into a string for each message—I don’t want to copy them all into a list of strings, and ideally I don’t even want to copy one at a time into an iterator or strings because some of them can be pretty huge; I want a list or iterator of views into substrings of the mmap. (This isn’t actually a great example, because even with substring views, the mmap can’t be used as a str in the first place, but it has the virtue of being a real example of code I’ve actually written.)
but if you had a view of a slice, and it was a proper view, it might be pretty poky for many string operations, so probably just as well not to have them.
I think in general people will expect that a slice view on a sequence acts like “some kind of sequence”, not like the same kind they’re viewing—again, they won’t be surprised if you can’t insert into a slice of a list. It’s only with str that I’m worried they might expect more than we can provide, which sucks because str is the one place we _couldn’t_ provide it even if we wanted to. But maybe I’m wrong and people won’t have this assumption, or will be easily cured of it.

On Mon, May 11, 2020 at 10:41:06AM -0700, Andrew Barnert via Python-ideas wrote:
I think in general people will expect that a slice view on a sequence acts like “some kind of sequence”, not like the same kind they’re viewing—again, they won’t be surprised if you can’t insert into a slice of a list.
o_O For nearly 30 years, We've been able to insert into a slice of a list. I'm going to be *really* surprise if that stops working. -- Steven

On Thu, May 14, 2020 at 2:58 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Mon, May 11, 2020 at 10:41:06AM -0700, Andrew Barnert via Python-ideas wrote:
I think in general people will expect that a slice view on a sequence acts like “some kind of sequence”, not like the same kind they’re viewing—again, they won’t be surprised if you can’t insert into a slice of a list.
o_O
For nearly 30 years, We've been able to insert into a slice of a list. I'm going to be *really* surprise if that stops working.
At this point, we're thinking a sequence view would be immutable anyway. Even for views on immutable objects. So That's a non-issue. In numpy, there really isn't a view object at all -- there are simply numpy arrays, and any array *may* share the data block with another array. But they are both "proper" arrays. In [20]: import numpy as np In [21]: A = np.ones((5,)) In [22]: B = A[:] In [23]: type(A) Out[23]: numpy.ndarray In [24]: type(B) Out[24]: numpy.ndarray There is a small distinction: in the above case, A "owns" the data block, but you can only tell if you poke into the flags: In [27]: A.flags Out[27]: C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True WRITEBACKIFCOPY : False UPDATEIFCOPY : False In [28]: B.flags Out[28]: C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True WRITEBACKIFCOPY : False UPDATEIFCOPY : False And user code very rarely needs to care about that. That flag is mostly used to manege the memory, and prevent dangerous operations: In [30]: A.resize((3,4)) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-30-cd4cee3d5a8d> in <module> ----> 1 A.resize((3,4)) ValueError: cannot resize an array that references or is referenced by another array in this way. There are good reasons for ndarrays being able to share data while still being mutable, but I don't think a "normal" Python seqence_view should be mutable -- it would lead to a lot of confusion. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On May 14, 2020, at 03:01, Steven D'Aprano <steve@pearwood.info> wrote:
On Mon, May 11, 2020 at 10:41:06AM -0700, Andrew Barnert via Python-ideas wrote:
I think in general people will expect that a slice view on a sequence acts like “some kind of sequence”, not like the same kind they’re viewing—again, they won’t be surprised if you can’t insert into a slice of a list.
o_O
For nearly 30 years, We've been able to insert into a slice of a list. I'm going to be *really* surprise if that stops working
Which is exactly why Christopher said from the start of this thread, and everyone else has agreed at every step of the way, that we can’t change the default behavior of slicing, we have to instead add some new way to specifically ask for something different. Well, not _jusr_ this. There’s also the fact that for 30 years people have been using [:] to mean copy, and the fact that for 30 years people have taken small slices of giant lists and then expected the giant lists to get collected, and so on. But any one of these is enough reason on its own that copy-slicing must remain the default, behavior you get from lst[10:20]. Not only that, but whatever gives you view-slicing must look sufficiently different that you notice the difference—and ideally that gives you something you can look up if you don’t know what it means. I think lst.view[10:20] fits that bill.

On 14/05/2020 17:47, Andrew Barnert via Python-ideas wrote:
Which is exactly why Christopher said from the start of this thread, and everyone else has agreed at every step of the way, that we can’t change the default behavior of slicing, we have to instead add some new way to specifically ask for something different.
Erm, did someone actually ask for something different? As far as I can tell the original thread OP was asking for islice-maker objects, which don't require the behaviour of slicing to change at all. Quite where the demand for slice views has come from I'm not at all clear. -- Rhodri James *-* Kynesim Ltd

On May 14, 2020, at 10:45, Rhodri James <rhodri@kynesim.co.uk> wrote:
On 14/05/2020 17:47, Andrew Barnert via Python-ideas wrote:
Which is exactly why Christopher said from the start of this thread, and everyone else has agreed at every step of the way, that we can’t change the default behavior of slicing, we have to instead add some new way to specifically ask for something different.
Erm, did someone actually ask for something different? As far as I can tell the original thread OP was asking for islice-maker objects, which don't require the behaviour of slicing to change at all. Quite where the demand for slice views has come from I'm not at all clear.
That doesn’t make any difference here. If you want slicing sequences to return iterators rather than copies, that would break way too much code, so it’s not going to happen. A different method/property/class/function that gives you iterators would be fine. If you want slicing sequences to return views rather than copies, that would break way too much code, so it’s not going to happen. A different method/property/class/function that gives you iterators would be fine. Which is why nobody has proposed changing what list.__getitem__, etc. will do. As for where views came from: because they do everything iterators do plus things they don’t, and in this case they’re about as easy to implement. It’s really the same thing as dict.items. People wanted a dict.items that didn’t copy the whole thing into a giant list. The first suggestion was for an iterator. But that would break too much code, so it couldn’t be done until 3.0. But it was still so useful that it was worth having before 3.x, so it was added to 2.6 with a distinct name, iteritems. But then people realized they could have a view just as easily as an iterator, and it would do more, so that’s what actually went into 3.0. And that turned out to be so useful that it was worth having before 3.x, so, even though iteritems had already been added in 2.6, it was phased out for viewitems in 2.7. I’m just trying to jump to the end here. Some of the issues aren’t the same (should it be a function or an attribute, is it worth having custom implementations for some builtin types, …), but some of them are, so we can learn from the past instead of repeating the same process. We can just build the equivalent of viewitems right off the bat, and not even think about changing plain slicing (because we never want another 3.0 break). (Of course there may still be good arguments for why this isn’t the same, or for why it should end up differently even if it _is_ the same.)

On 14/05/2020 19:56, Andrew Barnert wrote:
On May 14, 2020, at 10:45, Rhodri James<rhodri@kynesim.co.uk> wrote:
On 14/05/2020 17:47, Andrew Barnert via Python-ideas wrote:
Which is exactly why Christopher said from the start of this thread, and everyone else has agreed at every step of the way, that we can’t change the default behavior of slicing, we have to instead add some new way to specifically ask for something different.
Erm, did someone actually ask for something different? As far as I can tell the original thread OP was asking for islice-maker objects, which don't require the behaviour of slicing to change at all. Quite where the demand for slice views has come from I'm not at all clear. That doesn’t make any difference here.
If you want slicing sequences to return iterators rather than copies, that would break way too much code, so it’s not going to happen. A different method/property/class/function that gives you iterators would be fine.
We already have such. It's called itertools.islice(). I'm sorry, but you're missing the point here. You and Christopher seem to be having fun discussing this at great length, and that's fine. However at this point I've not grasped the proposal and I've lost the will to even contemplate the details. What I have grasped is that no one else has offered much opinion, so saying that "everyone else has agreed at every step of the way" doesn't actually have the weight it pretends to. -- Rhodri James *-* Kynesim Ltd

A
different method/property/class/function that gives you iterators would be fine.
We already have such. It's called itertools.islice().
If you had read the proposal, you’d know that was brought up, obviously. I'm sorry, but you're missing the point here. You and Christopher seem
to be having fun discussing this at great length, and that's fine. However at this point I've not grasped the proposal and I've lost the will to even contemplate the details.
Fair enough — I need to update the proposal with the new details. What I have grasped is that no
one else has offered much opinion, so saying that "everyone else has agreed at every step of the way" doesn't actually have the weight it pretends to.
Sure, but that was referring to a single point (changing how standard Sequence slicing would work), and it’s not how I would have phrased it. I might have said: "no one has suggested otherwise" If not one is proposing something, it doesn't much matter how many folks have been involved in the conversation :-) As for not many people having contributed to the conversation, I'm a bit surprised -- there is a LOT of discussion about all kinds of ideas that are never going to see the light of day. Maybe that's a good sign -- if people don't pile on to tell me why it's a bad idea, maybe it has a shot :-) Or it's because I didn't put much text in email, but rather pointed to an external git repo. If/when I can find the time, I'll updated my ideas and post again. -CHB

On Thu, May 14, 2020 at 09:47:36AM -0700, Andrew Barnert wrote:
On May 14, 2020, at 03:01, Steven D'Aprano <steve@pearwood.info> wrote:
On Mon, May 11, 2020 at 10:41:06AM -0700, Andrew Barnert via Python-ideas wrote:
I think in general people will expect that a slice view on a sequence acts like “some kind of sequence”, not like the same kind they’re viewing—again, they won’t be surprised if you can’t insert into a slice of a list.
o_O
For nearly 30 years, We've been able to insert into a slice of a list. I'm going to be *really* surprise if that stops working
Which is exactly why Christopher said from the start of this thread, and everyone else has agreed at every step of the way, that we can’t change the default behavior of slicing, we have to instead add some new way to specifically ask for something different.
Which is why I was so surprised that you suddenly started talking about not being able to insert into a slice of a list rather than a view.
Not only that, but whatever gives you view-slicing must look sufficiently different that you notice the difference—and ideally that gives you something you can look up if you don’t know what it means. I think lst.view[10:20] fits that bill.
Have we forgotten how to look at prior art all of a sudden? Suddenly been possessed by the spirits of deceased Java and Ruby programmers intent on changing the look and feel of Python to make it "real object oriented"? *wink* We have prior art here: b'abcd'.memoryview # No, not this. memoryview(b'abcd') # That's the one. 'abcd'.iter # No, not that either. iter('abcd') # That's it. In fairness, I do have to point out that dict views do use a method interface, but: 1. Dict views came with a lot of backwards-compatibility baggage; they were initially methods that returned lists; then methods that returned iterators were added, then methods that returned views were added, and finally in 3.x the view methods were renamed and the other six methods were removed. 2. There is only a single builtin mapping object, dict, not like sequences where there are lists, tuples, range objects, strings, byte strings and bytearrays. 3. Dicts need three kinds of view, keys/items/values, not just one; adding three new builtin functions just for dicts is perhaps a bit excessive. So if we're to add a generic sequence view object, none of those factors are relevent: 1. No backwards-compatibility baggage; we can pick the interface which is the most Pythonic. That's a protocol based on a dunder, not a method. 2. At least six builtins, not one. 3. Only one kind of sequence view, not three. -- Steven

TL;DR: no need to come to consensus about the most "Pythonic" API for a sequence view -- due to potential name clashes, adding a dunder is pretty much the only option. Details below: On Fri, May 15, 2020 at 3:50 AM Steven D'Aprano <steve@pearwood.info> wrote:
I think lst.view[10:20] fits that bill.
Have we forgotten how to look at prior art all of a sudden? Suddenly been possessed by the spirits of deceased Java and Ruby programmers intent on changing the look and feel of Python to make it "real object oriented"? *wink*
I know you winked there, but frankly, there isn't a clear most Pythonic API here. Surely you do'nt think PYhton should have no methods?
We have prior art here: b'abcd'.memoryview # No, not this. memoryview(b'abcd') # That's the one.
That's not the ideal example, memoryviews are a real oddball -- they are designed very much to be supported by third party applications. And a memoryview object is a type, unlike, say, len(). Though I guess we'd be talking about a "view" type here as well. 'abcd'.iter # No, not that either.
iter('abcd') # That's it.
That's closer -- it certainly could have been added to the Iterable ABC.
In fairness, I do have to point out that dict views do use a method
interface, but:
1. Dict views came with a lot of backwards-compatibility baggage;
I think this is really the key point here. Not so much the "baggage", but the fact that dicts (and therefor Mappings) have always had .keys, .values, and .items. so adding the views didn't add or remove any new attributes, 2. There is only a single builtin mapping object, dict, not like
sequences where there are lists, tuples, range objects, strings, byte strings and bytearrays.
True, but there are multiple Mapping objects in the Standard Library, at it is the intenet that the Mapping ABCs can be used by third party classes -- so I'm not sure that is such a big distinction. 3. Dicts need three kinds of view, keys/items/values, not just one;
adding three new builtin functions just for dicts is perhaps a bit excessive.
Well, one could have created a MappingView object with three attributes. And maybe a full MApping view would be useful, though I can't think of a use case at the moment. 1. No backwards-compatibility baggage; we can pick the interface which
is the most Pythonic. That's a protocol based on a dunder, not a method.
I disagree here, but come to the same conclusion: adding an attribute to the Sequence ABC will break backward compatibility -- any Sequence subclass that already has an attribute with that name would break. We can all argue about what the most Pythonic API is, but the fact is that Python has both "OO" APIs and "function-based" APIs. So either one could be acceptable. But when adding a new name, there is a different impact depending on what namespace it is added to: A) Adding a reserved word is a Really Big Deal -- only done when absolutely necessary. (and completely off the table for this one) B) Adding a name to an ABC is a Big Deal -- it could potentially invalidate any subclasses of that ABC -- so suddenly subclasses that worked perfectly fine would be broken. And in the case at hand, numpy arrays do, in fact, already have a .view method that is not the same thing. C) Adding a builtin name is a Medium Deal, but not too huge -- existing code might overwrite it, but that's only an issue if they want to use the new functionality. E) Adding a new name to a standard library module is Small Deal -- no third parties should be adding stuff to that namespace anyway (and import * is not recommended) (not that adding new functionality to the stdlib isn't a lift -- but I'm only talking about names now) F) Adding a new dunder is a Medium Deal -- the dunder names are explicitly documented as being reserved -- so while folks may (and do) use dunder names in third party libraries, it's there problem if something breaks later on. (for instance, numpy used a lot of dunders - though AFAICT, they are all "__array_*", so kinda a numpy namespace. Taking all that into account, if we want to add "something" to Sequence behavior (in this case a sequence_view object), then adding a dunder is really the only option -- you'd need a really compelling reason to add a Sequence method, and since there are quite a few folks that think that's the wrong approach anyway, we don't have a compelling reason. So IF a sequence_view is to be added, then a dunder is really the only option. Then we need to decide on where to put the view-creating-function (and what to call it). I personally would like to see it as a built in, but I suspect we wont get a lot of support for that on this list. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On May 15, 2020, at 13:03, Christopher Barker <pythonchb@gmail.com> wrote:
Taking all that into account, if we want to add "something" to Sequence behavior (in this case a sequence_view object), then adding a dunder is really the only option -- you'd need a really compelling reason to add a Sequence method, and since there are quite a few folks that think that's the wrong approach anyway, we don't have a compelling reason.
So IF a sequence_view is to be added, then a dunder is really the only option.
Once you go with a separate view-creating function (or type), do we even need the dunder? I’m pretty sure a generic slice-view-wrapper (that just does index arithmetic and delegates) will work correctly on every sequence type. I won’t promise that the one I posted early in this thread does, of course, and obviously we need a bit more proof than “I’m pretty sure…”, but can anyone think of a way a Sequence could legally work that would break this? And I can’t think of any custom features a Sequence might want add to its view slices (or its view-slice-making wrapper). I can definitely see how a custom wrapper for list and tuple could be faster, and imagine how real life code could use it often enough that this matters. But if it’s just list and tuple, CPython’s already full of builtins that fast-path on list and tuple, and there’s no reason this one can’t do the same thing. So, it seems like it only needs a dunder if there are likely to be third-party classes that can do view-slicing significantly faster than a generic view-slicer, and are used in code where it’s likely to matter. Can anyone think of such a case? (At first numpy seems like an obvious answer. Arrays aren’t Sequences, but I think as long as the wrapper doesn’t actually type-check that at __new__ time they’d work anyway. But why would anyone, especially when they care about speed, use a generic viewslice function on a numpy array instead of just using numpy’s own view slicing?) It seems like a dunder is something that could be added as a refinement in the next Python version, if it turns out to be needed. If so, then, unless we have an example in advance to disprove the YAGNI presumption, why not just do it without the dunder?

On Fri, May 15, 2020 at 5:45 PM Andrew Barnert <abarnert@yahoo.com> wrote:
On May 15, 2020, at 13:03, Christopher Barker <pythonchb@gmail.com> wrote:
Taking all that into account, if we want to add "something" to Sequence
behavior (in this case a sequence_view object), then adding a dunder is really the only option -- you'd need a really compelling reason to add a Sequence method, and since there are quite a few folks that think that's the wrong approach anyway, we don't have a compelling reason.
So IF a sequence_view is to be added, then a dunder is really the only
option.
Once you go with a separate view-creating function (or type), do we even need the dunder?
Indeed -- maybe not. We'd need a dunder if we wanted to make it an "official" part of the Sequence protocol/ABC, but as you point out there may be no need to do that at all. Hmm, more thought needed. -CHB
I’m pretty sure a generic slice-view-wrapper (that just does index arithmetic and delegates) will work correctly on every sequence type. I won’t promise that the one I posted early in this thread does, of course, and obviously we need a bit more proof than “I’m pretty sure…”, but can anyone think of a way a Sequence could legally work that would break this?
And I can’t think of any custom features a Sequence might want add to its view slices (or its view-slice-making wrapper).
I can definitely see how a custom wrapper for list and tuple could be faster, and imagine how real life code could use it often enough that this matters. But if it’s just list and tuple, CPython’s already full of builtins that fast-path on list and tuple, and there’s no reason this one can’t do the same thing.
So, it seems like it only needs a dunder if there are likely to be third-party classes that can do view-slicing significantly faster than a generic view-slicer, and are used in code where it’s likely to matter. Can anyone think of such a case? (At first numpy seems like an obvious answer. Arrays aren’t Sequences, but I think as long as the wrapper doesn’t actually type-check that at __new__ time they’d work anyway. But why would anyone, especially when they care about speed, use a generic viewslice function on a numpy array instead of just using numpy’s own view slicing?)
It seems like a dunder is something that could be added as a refinement in the next Python version, if it turns out to be needed. If so, then, unless we have an example in advance to disprove the YAGNI presumption, why not just do it without the dunder?
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On May 15, 2020, at 18:21, Christopher Barker <pythonchb@gmail.com> wrote:
On Fri, May 15, 2020 at 5:45 PM Andrew Barnert <abarnert@yahoo.com> wrote:
On May 15, 2020, at 13:03, Christopher Barker <pythonchb@gmail.com> wrote:
Taking all that into account, if we want to add "something" to Sequence behavior (in this case a sequence_view object), then adding a dunder is really the only option -- you'd need a really compelling reason to add a Sequence method, and since there are quite a few folks that think that's the wrong approach anyway, we don't have a compelling reason.
So IF a sequence_view is to be added, then a dunder is really the only option.
Once you go with a separate view-creating function (or type), do we even need the dunder?
Indeed -- maybe not. We'd need a dunder if we wanted to make it an "official" part of the Sequence protocol/ABC, but as you point out there may be no need to do that at all.
That’s actually a what triggered this thought. We need collections.abc.Sequence to support the dunder with a default implementation so code using it as a mixin works. What would that default implementation be? Basically just a class whose __getitem__ constructs the thing I posted earlier and that does nothing else. And why would anyone want to override that default? Being able to override dunders like __in__ and regular methods like count is useful for multiple reasons: a string-like class needs to extend their behavior for substring searching, a range-like class can implement them without searching at all, etc. But none of those seemed to apply to overriding __viewslice__ (or whatever we’d call it).
Hmm, more thought needed.
Yeah, certainly just because I couldn’t think of a use doesn’t mean there isn’t one. But if I’m right that the dunder could be retrofitted in later (I want to try building an implementation without the dunder and then retrofitting one in along with a class that overrides it, if I get the time this weekend, to verify that it really isn’t a problem), that seems like a much better case for leaving it out. Another point: now that we’re thinking generic function (albeit maybe a C builtin with fast-path code for list/tuple), maybe it’s worth putting an implementation on PyPI as soon as possible, so we can get some experience using it and make sure the design doesn’t have any unexpected holes and, if we’re lucky, get some uptake from people outside this thread.

On May 15, 2020, at 18:21, Christopher Barker <pythonchb@gmail.com> wrote:
Hmm, more thought needed.
Speaking of “more thought needed”, I took a first pass over cleaning up my quick&dirty slice view class and adding the slicer class, and found some bikesheddable options. I think in most cases the best answer is obvious, but I’ve been wrong before. :) Assume s and t are Sequences of the same type, u is a Sequence or a different type, and vs, vt, and vu are view slices on those sequences. Also assume that we called the view slicer type vslice, and the view slice type SliceView, although obviously those are up for bikeshedding. When s==t is allowed, is vs==vt? What about vs==t? Same for <, etc.? I think yes, yes, yes. When s is hashable, is vs hashable? If so, is it the same hash an equivalent copy-slice would have? The answer to == constrains the answer here, of course. I think they can just not be hashable, but it’s a bit weird to have an immutable builtin sequence that isn’t. (Maybe hash could be left out but then added in a future version if there’s a need?) When s+t is allowed, is vs+t? vs+vt? (Similarly when s+u is allowed, but that usually isn’t.) vs*3? I think all yes, but I’m not sure. (Imagine you create a million view slices but filter them down to just 2, and then concatenate those two. That makes sense, I think.) Should there be a way to ask vs for the corresponding regular copy slice? Like vslice(s)[10:].strictify() == s[10:]? I’m not sure what it’s good for, but either __hash__ or __add__ seems to imply a private method for this, and then I can’t see any reason to prevent people from calling it. (Except that I can’t think of a good name.) Should the underlying sequence be a public attribute? It seems easy and harmless and potentially useful, and memoryview has .obj (although dict views don’t have a public reference to the dict). What about the original slice object? This seems less useful, since you don’t pass around slice objects that often. And we may not actually be storing it. (The simplest solution is to store slice.indices(len(seq)) instead of slice.) So I think no. If s isn’t a Sequence, should vslice(s) be a TypeError. I think we want the C API sequence check, but not the full ABC check. What does vslice(s)[1] do? I think TyoeError('not a slice'). Does the vslice type need any other methods besides __new__ and __getitem__? I don’t think so. The only use for vslice(s) besides slicing it is stashing it to be sliced later, just like the only use for a method besides calling it is stashing it to be called later. But it should have the sequence as a public attribute for debugging/introspection, just like methods make their self and function attributes public. Is the SliceView type public? (Only in types?) Or is “what the vslice slicer factory creates” an implementation detail, like list_iter. I think the latter. What’s the repr for a SliceView? Something like vslice([1, 2, 10, 20])[::2] seems most useful, since that’s the way you construct it, even if it is a bit unusual. Although a tiny slice of a giant sequence would then have a giant repr. What’s the str? I think same as the repr, but will people expect a view of a list/tuple/etc. to look “nice” like list/tuple/etc. do? Does vs[:] return self? (And, presumably, vs[0:len(s)+100] and so on.) I think so, but that doesn’t need to be guaranteed (just like tuple, range, etc.). If vs is an instance of a subclass of SliceView, is vs[10:20] a SliceView, or an instance of the subclass? I think the base class, just like tuple, etc.

On Fri, May 15, 2020 at 05:44:59PM -0700, Andrew Barnert wrote:
Once you go with a separate view-creating function (or type), do we even need the dunder?
Possibly not. But the beauty of a protocol is that it can work even if the object doesn't define a `__view__` dunder. - If the object defines `__view__`, call it; this allows objects to return an optimized view, if it makes sense to them; e.g. bytes might simply return a memoryview. - If not, fall back on a generic view object that just does index arithmetic and delegation. -- Steven

On May 15, 2020, at 21:35, Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, May 15, 2020 at 05:44:59PM -0700, Andrew Barnert wrote:
Once you go with a separate view-creating function (or type), do we even need the dunder?
Possibly not. But the beauty of a protocol is that it can work even if the object doesn't define a `__view__` dunder.
Sure, but if there’s no good reason for any class to provide a __view__ dunder, it’s better not to call one. Which is why I asked—in the message you’re replying to—a bunch of questions to try to determine whether there’s any reason for a class to want to provide an override. I’m not going to repeat the whole thing here; it’s all still in that same message you replied to.
- If the object defines `__view__`, call it; this allows objects to return an optimized view, if it makes sense to them; e.g. bytes might simply return a memoryview.
Not if memoryview doesn’t have the right API, as we discussed earlier in this thread. But more importantly, if it’s only builtins that will likely ever need an optimization, we can do that inside the functions. That’s exactly what we do in hundreds of places already. Even the one optimization that’s exposed as part of the public C API, PySequence_Fast, isn’t hookable, much less all the functions that fast-path directly on the array in list/tuple or on the split hash table in set/dict/dict_keys and so on. It seems to work well enough in practice, and it’s simpler, and faster for the builtins, and it means we don’t have hundreds of extra dunders (and type slots in CPython) that will almost never be used, and PyPy doesn’t need to write hooks that are actually pessimizations just because they’re optimizations in CPython, and so on. Of course there might be a reason that doesn’t apply in this case (there obviously is a good reason for non-builtin types to optimize __contains__, for example), but “there might be” isn’t an answer to YAGNI. Especially if we can add the dunder later if someone later finds a need for it. And honestly, I’m not sure even list and tuple are worth optimizing here. After all, you can’t do the index arithmetic and call to sq_ifem significantly faster than a generic C function; it only helps if you can avoid the call to sq_item, and I think we can’t do that in any of the most useful cases (at least not without patching up a whole lot more code than we want). But I’ll try it and see if I’m wrong.

On Fri, May 15, 2020 at 01:00:09PM -0700, Christopher Barker wrote:
I know you winked there, but frankly, there isn't a clear most Pythonic API here. Surely you do'nt think PYhton should have no methods?
That's not what I said. Of course Python should have methods -- it's an OOP language after all, and it's pretty hard to have objects unless they have behaviour (methods). Objects with no behaviour are just structs. But seriously, and this time no winking, Python's design philosophy is very different from that of Java and even Ruby and protocols are a hugely important part of that. Python without protocols wouldn't be Python, and it would be a much lesser language. [Aside: despite what the Zen says, I think *protocols* are far more important to Python than *namespaces*.] Python tends to have shallow inheritance hierarchies; Java has deep ones. Likewise Ruby tends to have related classes inherit from generic superclasses that provide default implementations. In we were like Ruby, there would be no problem: we'd just add a view method to something like object.Collections.Sequence and instantly all lists, tuples, range objects, strings, bytes, bytearrays etc would have that method. But we're not. In practice, each type would have to implement it's own view method. Python tends to use protocol-based top-level functions: len, int, str, repr, bool, iter, list etc are all based on *protocols*, not inheritance. The most notable counter-example to that was `iterator.next` which turned out to be a mistake and was changed in Python 3 to become a protocol based on a dunder. That's not to say that methods aren't sometimes appropriate, or that there may not be grey areas where we could go either way. But in general, the use of protocols is such a notable part of Python, and so unusual in other OOP languages, that it trips up newcomers often enough that there is a FAQ about it: https://docs.python.org/3/faq/design.html#why-does-python-use-methods-for-so... although the answer is woefully incomplete. See here for a longer version: http://effbot.org/pyfaq/why-does-python-use-methods-for-some-functionality-e... There is a *lot* of hate for Python's use of protocols, especially among people who have drunk the "not real object oriented" Koolaid, e.g. see comments here: https://stackoverflow.com/questions/237128/why-does-python-code-use-len-func... where this is described as "moronic". Let me be absolutely clear here: the use of protocols, as Python does, is a *brilliant* design, not a flaw, and in my opinion the haters are falling into the Blub trap: http://paulgraham.com/avg.html Using protocols looks moronic to them because they haven't seen how they add more power to the language and the coder. All they see are the ugly underscores. Why write a `__len__` method instead of a `len` method? There's no difference except four extra characters. That's some real Blub thinking right there. Unfortunately, len() hardly takes advantage of the possibilities of protocols, so it's an *obvious* example but not a *good* example. Here's a better example: py> class NoContains: ... def __getitem__(self, idx): ... if idx < 10: ... return 1000+idx ... raise IndexError ... py> 1005 in NoContains() True I wrote a class that doesn't define or inherit a `__contains__` method, but I got support for the `in` operator for free just by supporting subscripting. If you don't understand protocols, this is just weird. But that's your loss, not a design flaw. Another good example is `next()`. When I write an iterator class, I can supply a `__next__` dunder. All it needs to do is provide the next value. I've never needed to add support for default values in a `__next__` method, because the builtin `next()` handles it for me: _SENTINEL = object() try: ... except StopIteration: if default is not _SENTINEL: return default raise I get support for default values for free, thanks to the use of a protocol. If this were Python 2, with a `next` method, I'd have needed to write those six lines a couple of hundred times so far in my life, plus tests, plus documentation. Multiply that by tens of thousands of Python coders. Some day, if the next() builtin grows new functionality to change the exception raised: next(iterator, raise_instead=ValueError) not one single iterator class out of a million in the world will need to change a single line of code in order to get the new functionality. This is amazingly powerful stuff when handled properly, and len() is perhaps the most boring and trivial example of it. I'm going to be provocative: if (generic) you are not blown away by the possibilities of protocols, you don't understand them. -- Steven

On May 15, 2020, at 21:25, Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, May 15, 2020 at 01:00:09PM -0700, Christopher Barker wrote:
I know you winked there, but frankly, there isn't a clear most Pythonic API here. Surely you do'nt think PYhton should have no methods?
That's not what I said. Of course Python should have methods -- it's an OOP language after all, and it's pretty hard to have objects unless they have behaviour (methods). Objects with no behaviour are just structs.
But seriously, and this time no winking, Python's design philosophy is very different from that of Java and even Ruby and protocols are a hugely important part of that. Python without protocols wouldn't be Python, and it would be a much lesser language.
[Aside: despite what the Zen says, I think *protocols* are far more important to Python than *namespaces*.]
I agree up to this point. But what you’re missing is that Python (even with stdlib stuff like pickle/copy and math.floor) has only a couple dozen protocols, and hundreds and hundreds of methods. Some things should be protocols, but not everything should, or even close. Very few things should be protocols. More to the point, things should be protocols if and only if they have a specific reason to be a protocol. For example: 1. You need something more complicated than just a single straightforward call, like the fallback behavior for __contains__ and __iter__ with “old-style sequences”, or the whole pickle__getnewargs_ex__ and friends, or __add__ vs. __radd__. 2. Syntax, especially operator overloading, like __contains__ and __add__. 3. The function is so ubiquitously important that you don’t want anything else using the same name for different meanings, like __len__. (There are probably other good reasons.) When you have a reason like this, you should design a protocol. But when you don’t, dot syntax is the default. And it’s not just complexity, or “too many builtins” (after all, pickle.dump and math.ceil aren’t builtins). It’s that dot syntax gives you built-in disambiguation that function call syntax doesn’t. If I have a sequence, xs.index(x) has an obvious meaning. But index(xs, x) would not, because means too many different things (in fact, we already have an __index__ protocol that does one of those different things), and it’s not like len where one of those meanings is so fundamental that we a actually want to discourage all the others. As I said elsewhere, I think we probably can’t have dot syntax in this case for other reasons. But that _still_ doesn’t necessarily mean we need a protocol. If we need to be able to override behavior but we can’t have dot syntax, *that* might be a good reason for a protocol, but either of those on its own is not a good reason, only the combination. It’s worth comparing C++, where “free functions are part of a class’s interface”. They don’t spell their protocols with underscores, or call them protocols, but they idea is all over the place. x+y tries x.operator+(y) plus various fallbacks. The way you get an iterator is begin(xs) which by default calls xs.begin() so that’s the standard place to customize it but there are fallbacks. Converting a C to a D tries (among other things) both C::operator D() and D::D(C). And so on. But, unlike Python, they don’t try to distinguish what is and isn’t a protocol; the dogma is basically that everything should be a protocol if it possibly can be. Which doesn’t work. They keep trying to solve the compiler-ambiguity problem by adding features like argument-dependent lookup, and almost adding D’s uniform call syntax every 3 years, but none of that will ever solve the human-ambiguity problem. Things like + and begin and swap belong at the top level because they should always mean the same thing even if they have to be implemented differently, but things like draw should be methods because they mean totally different things on different types, and even if the compiler can tell which one is meant, even if an IDE can help you, deck.draw(5) vs. shape.draw(ctx) is still more readable than draw(deck, 5) vs. draw(shape, ctx). Ultimately, it’s just as bad as Java; it just goes too far in the opposite direction, which is still too far, and that’s what always happens when you’re looking for a perfect and simple dogma that applies to both iter and index so you never have to think about design.
Python tends to use protocol-based top-level functions:
len, int, str, repr, bool, iter, list
etc are all based on *protocols*, not inheritance.
The most notable counter-example to that was `iterator.next` which turned out to be a mistake and was changed in Python 3 to become a protocol based on a dunder.
No, the most notable counter examples are things like insert, extend, index, count, etc. on sequences; keys, items, update, setdefault, etc. on mappings; add, isdisjoint, etc. on sets; real, imag, etc. on numbers; send, throw, and close on generators… not to mention the dozens of public methods on string and bytes-like types. None of these things are functions that call protocol dunders, they’re all (still) methods that you call directly (or data attributes, in a few cases). And that’s as it should be. Also, inheritance isn’t even relevant here. List doesn’t inherit index from anywhere. Duck typing already solves the problem of unnecessary inheritance; if that’s the only thing you’re trying to avoid, you don’t need a protocol.
That's not to say that methods aren't sometimes appropriate, or that there may not be grey areas where we could go either way. But in general, the use of protocols is such a notable part of Python, and so unusual in other OOP languages,
And yet, there are still far, far more methods than protocols even in Python. Metaclasses are also a notable part of Python; most OO languages don’t have them, and even In Smalltalk (which I think is the language that mainstreamed the idea) they’re not as powerful and flexible as in Python. But that doesn’t mean most classes in Python should have a custom metaclass. Even things like namedtuple and module that at first glance seem like a case for metaclass often don’t need them. They’re used whenever they’re useful, not whenever possible. In the same way, Python uses protocols whenever they’re useful, not whenever possible.
that it trips up newcomers often enough that there is a FAQ about it:
https://docs.python.org/3/faq/design.html#why-does-python-use-methods-for-so...
Notice that this question is specifically about the difference between len, which is a protocol, and index, which is not. Even though they’re both things that all sequences support, that different sequences have to implement in different ways, etc., none of that is a sufficient reason to be a protocol.
There is a *lot* of hate for Python's use of protocols, especially among people who have drunk the "not real object oriented" Koolaid
Sure. Notice that C++, PHP, and Go get a lot of the same hate. The difference is that Python doesn’t deserve it, and it’s worth looking at why. In Python terms, C++ tries to make everything you could conceivably need to treat as a protocol work that way; PHP and Go are completely haphazard and arbitrary (and most of the things that look like protocols aren’t actually hookable except by a couple of special classes with compiler support); Python makes most of the things you’d want to be protocols work like protocols and most things you don’t, not. Unless you’re willing to learn to think Pythonically, it’s hard to believe that could work—there are an infinite number of things that you could in theory want to hook that you can’t—but in reality it’s a lot easier to identify the things you actually need and get 95% of the way there than to get 100% of what anyone could need in theory. (And then there’s always the incredibly dynamic runtime for when you really, really need some of that last 5%.) It sucks that some people will never get that (and it’s funny that so many of them like Unix…), but for the rest of us, Python works great. And that’s why we have to approach these questions by looking for the tradeoffs and how they apply to this instance, not looking for the rule that will tell us the answer without having to think. A general argument that protocols can be useful is not an argument that any particular X should be a protocol.

Steven D'Aprano writes:
[Aside: despite what the Zen says, I think *protocols* are far more important to Python than *namespaces*.]
I think you misread the Zen. :-) That-is-my-opinion-I-do-not-however-speak-for-its-author-ly y'rs,

On May 15, 2020, at 03:50, Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, May 14, 2020 at 09:47:36AM -0700, Andrew Barnert wrote:
On May 14, 2020, at 03:01, Steven D'Aprano <steve@pearwood.info> wrote:
Which is exactly why Christopher said from the start of this thread, and everyone else has agreed at every step of the way, that we can’t change the default behavior of slicing, we have to instead add some new way to specifically ask for something different.
Which is why I was so surprised that you suddenly started talking about not being able to insert into a slice of a list rather than a view.
We’re talking about slice views. The sentence you quoted and responded to was about the difference between a slice view from a list and a slice view from a string. A slice view from a list may or may not be the same type as a slice view from a tuple (I don’t think there’s a reason to care whether they are or not), but either way, it being immutable will, I think, not surprise anyone. By contrast, a slice view from a string being not stringy _might_ surprise someone.
Not only that, but whatever gives you view-slicing must look sufficiently different that you notice the difference—and ideally that gives you something you can look up if you don’t know what it means. I think lst.view[10:20] fits that bill.
Have we forgotten how to look at prior art all of a sudden? Suddenly been possessed by the spirits of deceased Java and Ruby programmers intent on changing the look and feel of Python to make it "real object oriented"? *wink*
No, we have remembered that language design is not made up of trivial rules like “functions good, methods bad”, but of understanding the tradeoffs and how they apply in each case.
We have prior art here:
b'abcd'.memoryview # No, not this. memoryview(b'abcd') # That's the one.
'abcd'.iter # No, not that either. iter('abcd') # That's it
In fairness, I do have to point out that dict views do use a method interface,
This is a secondary issue that I’ll come back to, but first: the whole thing that this started off with is being able to use slicing syntax even when you don’t want a copy. The parallel to the prior art is obvious: itertools.islice(seq, 10, 20) # if you don’t care about iterator or view sliceviews.slice(seq, 10, 20) # if you do The first one already exists. The second one takes 15 lines of code, which I slapped together and posted near the start of the thread. The only problem is that they don’t solve the problem of “use slicing syntax”. But if that’s the entire point of the proposal (at least for Chris), that’s a pretty big problem. Now, as we’d already been discussing (and as you quoted), you _could_ have a callable like this: viewslice(seq)[10:20] I can write that in only a few more lines than what I posted before, and it works. But it’s no longer parallel to the prior art. It’s not a function that returns a view, it’s a wrapper object that can be sliced to provide a view. There are pros and cons of this wrapper object vs. the property, but a false parallel with other functions is not one of them.
1. Dict views came with a lot of backwards-compatibility baggage; they were initially methods that returned lists; then methods that returned iterators were added, then methods that returned views were added, and finally in 3.x the view methods were renamed and the other six methods were removed.
This is, if anything, a reason they _shouldn’t_ have been methods. Changing the methods from 2.6 to 2.7 to 3.x, and in a way that tools like six couldn’t even help without making all of your code a bit uglier, was bad, and wouldn’t have been nearly as much of a problem if we’d just made them all functions in 2.6. And yet, the reasons for them being methods were compelling enough that they remain methods in 3.x, despite that problem. That’s how tradeoffs work.
2. There is only a single builtin mapping object, dict, not like sequences where there are lists, tuples, range objects, strings, byte strings and bytearrays.
Well. there’s also mappingproxy, which is a builtin even if its name is only visible in types. And there are other mappings in the stdlib, as well as popular third-party libraries like SortedContainers. And they all support these methods. There are some legacy third-party libraries never fully updated for 3.x still out there, but they don’t meet the Mapping protocol or its ABC. So, how does this distinction matter? Note that there is a nearly opposite argument for the wrapper object that someone already made that both seem a lot compelling to me: third-party types. We can’t change them overnight. And some of them might already have an attribute named view, or anything else we might come up with. Those are real negatives with the property design, in a way that “more of the code we _can_ easily change is in the Objects rather than Lib directory of CPython” isn’t.

On Sun, May 10, 2020 at 09:36:14PM -0700, Andrew Barnert via Python-ideas wrote:
for i in itertools.seq_view(a_list)[::2]: ...
I still think I prefer this though:
for i in a_list.view[::2]: ...
Agreed. A property on sequences would be best,
Why? This leads to the same problem that len() solves by being a function, not a method on list and tuple and str and bytes and dict and deque and .... Making views a method or property means that every sequence type needs to implement it's own method, or inherit from the same base class, and that's why in the Java world nobody agrees what method to call to get the length of an object. Python has a long history of using protocols and optional dunders for things like this: https://lucumr.pocoo.org/2011/7/9/python-and-pola/ So if we are to have a generic view proxy object, as opposed to the very much non-generic dict views, then it ought to be a callable function which, *if necessary*, delegates to a `__view__` dunder. -- Steven

On Thu, May 14, 2020 at 3:32 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Sun, May 10, 2020 at 09:36:14PM -0700, Andrew Barnert via Python-ideas wrote:
for i in itertools.seq_view(a_list)[::2]: ...
I still think I prefer this though:
for i in a_list.view[::2]:
Agreed. A property on sequences would be best,
Why?
This leads to the same problem that len() solves by being a function, not a method on list and tuple and str and bytes and dict and deque and .... Making views a method or property means that every sequence type needs to implement it's own method, or inherit from the same base class, and that's why in the Java world nobody agrees what method to call to get the length of an object.
I'm not a Java guy -- but I'm not sure that's the problem -- it sounds to me like Java does have clearly defined Interfaces (that's the Java concept, yes?) for Sequences, or, in this case, "Sized". Granted, len() as a builtin pre-dates ABCs, but we do have them now, and we do for a reason.
Python has a long history of using protocols and optional dunders for things like this:
Nice post, and I agree for teh most part, but frankly, I dont find it convicing. For example: "In Ruby collection objects respond to .size. But because it looks better almost all of them will also respond .length." And really? Java and Ruby both have these inconsistencies in the standard library? WTF? And why do we not have: "In Python, collection objects response to the len() function. But because it looks better, almost all of them will also have a .size property. I think it's more about "Python has a long history of using protocols", which I interpret to mean "people follow standards" (at least in the standard library!) , rather than "you have to use built in functions for any operation that might be applicable to more than one type" And despite that history, the ABCs DO have a few regular methods, and a bunch of mixins. And frankly, I don't really see the difference in terms of ease of implementation -- one way and all Sequences need to implement .view (or use the ABC mixin), and one they need to implement __view__ (or use the ABC mixin). Though as I write this, I realize, of course, that there IS an advantage to __view__ -- the dunders are a reserved namespace, so no one should have a custom Sequence that already has a .__view__ dunder. Whereas third party Sequences *may* already have a .view attribute. Indeed, numpy arrays do, and it does NOT mean the same thing that this would. Funny, if I go back to that post, it turns out I didn't find the whole "Java and Ruby haven't standardize on how to spell length" argument compelling, but later, he talks about how the dunders are reserved -- and THAT is, indeed, compelling. So that means a view() function (with maybe a different name) -- however, that brings up the issue of where to put it. I'm not sure that it warrants being in builtins, but where does it belong? Maybe the collections module? And I really think the extra import would be a barrier. Going back to the whole functions and protocols vs methods argument, the fact is that I don't think there IS a clear line between what belongs where. Let's face it, I don't think any of us would like it if we had to do something like: from collections import keys, values the_keys = keys(a_dict) And I think we all agree that moving string functionality into str methods was a really good idea. In fact, I think that when adding stuff to builtins, particularly ABC, we are stuck not with deciding where is best belongsl, but simiply with -- it can't be a method if it wasn't there near the beginning of Python's life.
So if we are to have a generic view proxy object, as opposed to the very
much non-generic dict views,
I still don't think that the "genericness" is the point here. A Sequence view isn't really any more generic than a MappingView, the difference in API is that Mappings have had .keys() and .values() and .items() forever, so it was possible to change what they return without potentially breaking other implementations. (that was also a py2-py3 change, which allowed more breakage). It would be kin do like changing what indexing with a slice meant -- but we are not advocating that! Growing a language is a challenge! -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

So that means a view() function (with maybe a different name) -- however, that brings up the issue of where to put it. I'm not sure that it warrants being in builtins, but where does it belong? Maybe the collections module? And I really think the extra import would be a barrier.
It occurs to me-- and please quickly shut me down if this is a really dumb idea, I won't be offended-- `memoryview` is already a top-level built-in. I know it has a near completely different meaning with regards to bytes objects than we are talking about with a sequence view object. But could it do double duty as a creator of views for sequences, too? --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

On May 14, 2020, at 11:53, Ricky Teachey <ricky@teachey.org> wrote:
So that means a view() function (with maybe a different name) -- however, that brings up the issue of where to put it. I'm not sure that it warrants being in builtins, but where does it belong? Maybe the collections module? And I really think the extra import would be a barrier.
It occurs to me-- and please quickly shut me down if this is a really dumb idea, I won't be offended-- `memoryview` is already a top-level built-in. I know it has a near completely different meaning with regards to bytes objects than we are talking about with a sequence view object. But could it do double duty as a creator of views for sequences, too?
But bytes and bytearray are Sequences, and maybe other things that support the buffer protocol are too. At first glance, it sounds terrible that the same function gives you a locking buffer view for some sequences and an indirect regular sequence view for others, and that there’s no way to get the latter for bytes even when you explicitly want that. But maybe in practice it wouldn’t be nearly as bad as it sounds? I don’t know. It sounds terrible in theory that NumPy arrays are almost but not quite Sequences, but in practice I rarely get confused by that. Maybe the same would be true here? There’s also the problem that “memoryview” is kind of a misleading name if you apply it to, say, a range instead of a list. But again, I’m not sure how bad that would be in practice.

On May 14, 2020, at 03:35, Steven D'Aprano <steve@pearwood.info> wrote:
On Sun, May 10, 2020 at 09:36:14PM -0700, Andrew Barnert via Python-ideas wrote:
for i in itertools.seq_view(a_list)[::2]: ...
I still think I prefer this though:
for i in a_list.view[::2]: ...
Agreed. A property on sequences would be best,
Why?
Because the whole point of this is for something to apply slicing syntax to. And compare: lst.view[10:20] view(lst)[10:20] vjew(lst, 10, 20) The last one is clearly the worst, because it doesn’t let you use slicing syntax. The others are both OK, but the first seems the most readable. I’ll give more detailed reasons below. (There may be reasons why it can’t or shouldn’t be done, which is why I ranked all of the options in order rather than just insisting that we must have the first one or I hate his whole idea.)
This leads to the same problem that len() solves by being a function, not a method on list and tuple and str and bytes and dict and deque and .... Making views a method or property means that every sequence type needs to implement it's own method, or inherit from the same base class,
But len doesn’t solve that problem at all, and isn’t meant to. It just means that every sequence type has to implement __len__ instead of every sequence type having to implement len. Protocols often provide some added functionality. iter() doesn’t just call __iter__, it can also fall back to old-style sequence methods, and it has the 2-arg form. Similarly, str() falls back to __repr__, and has other parameter forms, and doubles as the constructor for the string type. And next() even changed from being a normal method to a protocol and function, breaking backward compatibility, specifically to make it easier to do the 2-arg form. But len() isn’t like that. There is no fallback, no added behavior, nothing. It doesn’t add anything. So why do we have it? Guido’s argument is in the FAQ. It starts off with “For some operations, prefix notation just reads better than postfix”. He then backs up the general principle that this is sometimes true by appeal to math. And then he explains the reasons this is one of those operations by arguing that “len”’is the most important piece of information here so it belongs first. It’s the same principle here, but the specific answer is different. View-ness is not more important than the sequence and the slicing, so it doesn’t call out to be fronted. In fact, view-ness is (at least in the user’s mind) strongly tied to the slicing, so it calls out to be near the slice. And it’s not like this is some unprecedented thing. Most of the collection types, and corresponding ABCs, have regular methods as well as protocol dunders. Is anyone ever confused by having to write xs.index(x) instead of index(xs, x)? I don’t think so. In fact, I think the latter would be _more_ confusing, because “index” has so many different meanings that “list.index” is useful to nail it down. (Notice that we already _have_ a dunder named __index__, and it does something totally different…) And the same is true for “view”. In fact, everything in your argument is so generic that it acts as an argument against not just .index() but against any public methods or attributes on anything. Obviously you didn’t intend it that way, but once you actually target it so that it argues against .len() but not .index(), I don’t think there’s any argument against .view left.
and that's why in the Java world nobody agrees what method to call to get the length of an object.
Nobody can agree on what function to call in C or PHP even though they’re functions rather than methods in those languages. Everyone can agree on what method to use in C++ and Smalltalk even though they’re methods in those languages, just like Java. (In fact, C++ even loosely enforces consistency the same way Python loosely does, except at compile time instead of run time—if your class doesn’t have a size() method, it doesn’t duck type as a collection and therefore can’t be used in templates that want a collection.) Or just look at Python: nobody is confused about how to spell the .index method even though it’s a method. So the problem in Java has nothing to do with methods. (We don’t have to get into what’s wrong with Java here; it’s not relevant.)
So if we are to have a generic view proxy object, as opposed to the very much non-generic dict views, then it ought to be a callable function
We don’t actually _know_ how generic it can/should be yet. That’s something we’ve been discussing in this thread. It might well be a quality-of-implementation issue that has different best answers in different Pythons. Or it might not. It’s not obvious. Which implies that whatever the answer is, it’s not something that people should have to grasp it to understand the feature. You wouldn’t want to users to base their understanding of iter on knowing whether there’s one generic sequence iterator type or one for each type (especially since neither is true in CPython, but something halfway between and more complicated). And I think the same is true here. So you’re arguing for a callable function because it strongly implies a generic implementation, but I see that as an argument _against_ a function, not for it, and I also don’t think the argument holds anyway because it doesn’t imply any such thing for iter.

Andrew Barnert via Python-ideas writes:
Which is why it's not wrong to say that a range object is an iterator, but is IS wrong to say that it's Just and iterator ...
No, they’re not iterators. You’ve got it backward—every iterator is an iterable, but most iterables are not iterators.
An iterator is an iterable that has a __next__ method and returns self from __iter__. List, tuples, dicts, etc. are not iterators, and neither are ranges, or the dict views.
[example snipped]
A lot of people get this confused. I think the problem is that we don’t have a word for “iterable that’s not an iterator”,
I think part of the problem is that people rarely see explicit iterator objects in the wild. Most of the time we encounter iterator objects only implicitly. Nomenclature *is* a problem (I still don't know what a "generator" is: a function that contains "yield" in its def, or the result of invoking such a function), but part of the reason for that is that Python successfully hides objects like iterators and generator objects much of the time (I use generator expressions a lot, "yield" rarely).
or for the refinement “iterable that’s not an iterator and is reusable”, much less the further refinement “iterable that’s reusable, providing a distinct iterator that starts from the head each time, and allows multiple such iterators in parallel”.
Aside: Does "multiple parallel iterators" add anything to "distinct iterator that starts from the head each time"? Or did you mean what I would express as "and *so* it allows multiple parallel iterators"?
But that last thing is exactly the behavior you expect from “things like list, dict, etc.”, and it’s hard to explain, and therefore hard to document.
Um, you just did *explain* it, quite well IMHO, you just didn't *name* it. ;-)
The closest word for that is “collection”, but Collection is also a protocol that adds being a Container and being Sized on top of being Iterable, so it’s misleading unless you’re really careful. So the docs don’t clearly tell people that range, dict_keys, etc. are exactly that “like list, dict, etc.” thing, so people are confused about what they are. People know they’re lazy, they know iterators are lazy,
I'm not sure what "lazy" means here. range is lazy: the index it reports doesn't exist anywhere in the program's data until it computes it. But I wouldn't call a dict view "lazy" any more than I'd call the underlying dict "lazy". Views are references, or alternative access interfaces if you like. But the data for the view already exists.
so they think they’re a kind of iterator, and the docs don’t ever make it clear why that’s wrong.
I don't think the problem is in the docs. Iterators and views aren't the only things that are lazy, here. People are even lazier! :-) Of course that's somewhat unfair, but in a technical sense quite true: most people don't read the docs until they run into trouble getting the program to behave as they want.
There are no types in Python’s stdlib that have the behavior you suggested of being an iterator but resetting each time you iterate. (The closest thing is file objects, but you have to manually reset them with seek(0).)
Isn't manual reset exactly what you want from a resettable iterator, though? Sometimes you want an iterator on a file to reset (Emacs reads the last block of a Lisp library looking for a local variables block, then rereads the library to load it as Lisp -- you could do this with sequential access), and sometimes you want an interruptible iterator (mail message: read From_, read header, read body), and sometimes you want both (mbox file, you want to "unread" the From_ line of the next message). I guess there are cases where you want to read a prefix repeatedly (eg, simulation of different models on the same underlying pseudo-random sequence), but I think they're very specialized. Steve

On May 10, 2020, at 22:36, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Andrew Barnert via Python-ideas writes:
A lot of people get this confused. I think the problem is that we don’t have a word for “iterable that’s not an iterator”,
I think part of the problem is that people rarely see explicit iterator objects in the wild. Most of the time we encounter iterator objects only implicitly.
We encounter iterators in the wild all the time, we just don’t usually _care_ that they’re iterators instead of “some kind of iterable”, and I think that’s the key distinction you’re looking for. Certainly when you open a file, you usually deal with the file object. And whenever you feed the result of one genexpr into another, or into a map call, you are using an iterator. You often even store those iterators in variables. But if you change that first genexpr to a listcomp (say, because you want to be able to breakpoint there and print it to the debugger, or dump it to a log), nothing changes except performance. And people know this and take advantage of it without even thinking. And that’s true of the majority of places you use iterators. Code that explicitly needs an iterator (like the grouper idiom where you zip an iterator with itself) certainly does exist, but it’s nowhere near as common as code that can use any iterable and only uses an iterator because that’s the easiest thing to write or the most efficient thing. This is a big part of what I meant about the concepts being so nice that people manage to use them despite not being able to talk about them.
Nomenclature *is* a problem (I still don't know what a "generator" is: a function that contains "yield" in its def, or the result of invoking such a function), but part of the reason for that is that Python successfully hides objects like iterators and generator objects much of the time (I use generator expressions a lot, "yield" rarely).
You’re right. The fact that the concept (and the implementation of those concepts) is so nice that we rarely have to think about these things explicit is actually part of the reason it’s hard to do so on the rare occasions we need to. And put that way, it’s a pretty good tradeoff. Still, having clear names with simple definitions would help that problem without watering down the benefits.
or for the refinement “iterable that’s not an iterator and is reusable”, much less the further refinement “iterable that’s reusable, providing a distinct iterator that starts from the head each time, and allows multiple such iterators in parallel”.
Aside: Does "multiple parallel iterators" add anything to "distinct iterator that starts from the head each time"? Or did you mean what I would express as "and *so* it allows multiple parallel iterators"?
I’m being redundant here to make sure I’m understood, because just saying it the second way apparently didn’t get the idea across the first time.
But that last thing is exactly the behavior you expect from “things like list, dict, etc.”, and it’s hard to explain, and therefore hard to document.
Um, you just did *explain* it, quite well IMHO, you just didn't *name* it. ;-)
Well, it was a long, and redundant, explanation, not something you’d want to see in the docs or even a PEP.
The closest word for that is “collection”, but Collection is also a protocol that adds being a Container and being Sized on top of being Iterable, so it’s misleading unless you’re really careful. So the docs don’t clearly tell people that range, dict_keys, etc. are exactly that “like list, dict, etc.” thing, so people are confused about what they are. People know they’re lazy, they know iterators are lazy,
I'm not sure what "lazy" means here. range is lazy: the index it reports doesn't exist anywhere in the program's data until it computes it. But I wouldn't call a dict view "lazy" any more than I'd call the underlying dict "lazy". Views are references, or alternative access interfaces if you like. But the data for the view already exists.
“lazy” as in it creates something that acts like a list or a set, but hasn’t actually stored a list or set or other data structure in memory or done a bunch of up-front CPU work. You’re right that a more precise definition would probably include range but not dict_keys, but I think people do use it in a way that includes both, and that’s part of the reason they’re equally confused into thinking both are iterators.
so they think they’re a kind of iterator, and the docs don’t ever make it clear why that’s wrong.
I don't think the problem is in the docs. Iterators and views aren't the only things that are lazy, here. People are even lazier! :-) Of course that's somewhat unfair, but in a technical sense quite true: most people don't read the docs until they run into trouble getting the program to behave as they want.
Well, yes, but people writing proposals to change the language or designing PyPI libraries to extend it or writing StackOverflow answers to help other people learn it are getting it wrong, not just people using it day to day. Even when they cite quotes out of the docs, they still often get it wrong. Which makes me think the docs really are part of the problem. And not having names for things, even if they _are_ well explained somewhere, makes that problem hard to solve. A shorthand description is usually vague and it’s not clear where to go to to get clarification; a name is at least as vague but it’s obvious what to search for to get the exact definition (if there’s not already a link right there).
There are no types in Python’s stdlib that have the behavior you suggested of being an iterator but resetting each time you iterate. (The closest thing is file objects, but you have to manually reset them with seek(0).)
Isn't manual reset exactly what you want from a resettable iterator, though?
Yes. I certainly use seek(0) on files, and it’s a perfectly cromulent concept, it’s just not the concept I’d want on a range or a keys view or a sequence slice.

Andrew Barnert writes:
On May 10, 2020, at 22:36, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Andrew Barnert via Python-ideas writes:
A lot of people get this confused. I think the problem is that we don’t have a word for “iterable that’s not an iterator”,
I think part of the problem is that people rarely see explicit iterator objects in the wild. Most of the time we encounter iterator objects only implicitly.
We encounter iterators in the wild all the time, we just don’t usually _care_ that they’re iterators instead of “some kind of iterable”, and I think that’s the key distinction you’re looking for.
It *is* the distinction I'm making with the word "explicit". I never use "next" on an open file. I'm not sure your more precise statement is better. I think the real difference is that I'm thinking of "people" as including my students who have no clue what an iterator does and don't care what an iterable is, they just cargo cult with open("file") as f: for line in f: do_stuff(line) while as you point out (and I think is appropriate in this discussion) some people who are discussing proposed changes are using the available terminology incorrectly, and that's not good.
Still, having clear names with simple definitions would help that problem without watering down the benefits.
I disagree. I agree there's "amortized zero" cost to the crowd who would use those names fairly frequently in design discussions, but there is a cost to the "lazy in the technical sense" programmer, who might want to read the documentation if it gave "simple answers to simple questions", but not if they have to wade through a thicket of "twisty subtle definitions all alike" to get to the simple answer, and especially not if it's not obvious after all that what the answer is. It also makes conversations with experts fraught, as those experts will tend to provide more detail and precision than the questioner wants (speaking for myself, anyway!) "Not every one-sentence explanation needs terminology in the documentation."
But that last thing is exactly the behavior you expect from “things like list, dict, etc.”, and it’s hard to explain, and therefore hard to document.
Um, you just did *explain* it, quite well IMHO, you just didn't *name* it. ;-)
Well, it was a long, and redundant, explanation, not something you’d want to see in the docs or even a PEP.
The part I was referring to was the three or so lines preceding in which you defined the behavior desired for views etc. I guess to define terminology for all the variations that might be relevant would be long (and possibly unavoidably redundant).
“lazy” as in it creates something that acts like a list or a set, but hasn’t actually stored a list or set or other data structure in memory or done a bunch of up-front CPU work. You’re right that a more precise definition would probably include range but not dict_keys, but I think people do use it in a way that includes both, and that’s part of the reason they’re equally confused into thinking both are iterators.
This is another reason why I am not optimistic that more (and preferably, better ;-) terminology would help. We're already abusing terms that have fairly precise definitions in an analogous but wrong context. And there are better analogies. Instead of saying "views are lazy X" (I don't even know what X is being made lazy here!), we could borrow from Scheme and say views are "hygienic aliases". But we don't. Before we invent more terms for Humpty Dumpty to abuse, we should teach Humpty Dumpty a thing or two about the words he already knows.
And not having names for things, even if they _are_ well explained somewhere, makes that problem hard to solve. A shorthand description is usually vague and it’s not clear where to go to to get clarification; a name is at least as vague but it’s obvious what to search for to get the exact definition (if there’s not already a link right there).
In principle, I agree. In practice, nothing's perfect, and there are contravailing issues (especially misuse of the new names).
Isn't manual reset exactly what you want from a resettable iterator, though?
Yes. I certainly use seek(0) on files, and it’s a perfectly cromulent concept, it’s just not the concept I’d want on a range or a keys view or a sequence slice.
But you *don't* use seek(0) on files (which are not iterators, and in fact don't actually exist inside of Python, only names for them do). You use them on opened *file objects* which are iterators. When you open a file again, by default you get a new iterator which begins at the beginning, as you want for those others. My point is that none of the other types you mention are iterators. The difference with files is just that they happen to exist in Python as iterables. But after r = range(n) ri = iter(range) for i in ri: if i > n_2: break you want the next "for j in ri:" to start where you left off, no? Did you confuse iterable with iterator, or did I miss your point, or is there a third possibility? ;-) Steve

On May 12, 2020, at 23:29, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Andrew Barnert writes:
On May 10, 2020, at 22:36, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Andrew Barnert via Python-ideas writes:
A lot of people get this confused. I think the problem is that we don’t have a word for “iterable that’s not an iterator”,
I think part of the problem is that people rarely see explicit iterator objects in the wild. Most of the time we encounter iterator objects only implicitly.
We encounter iterators in the wild all the time, we just don’t usually _care_ that they’re iterators instead of “some kind of iterable”, and I think that’s the key distinction you’re looking for.
It *is* the distinction I'm making with the word "explicit". I never use "next" on an open file. I'm not sure your more precise statement is better.
I think the real difference is that I'm thinking of "people" as including my students who have no clue what an iterator does and don't care what an iterable is, they just cargo cult
with open("file") as f: for line in f: do_stuff(line)
while as you point out (and I think is appropriate in this discussion) some people who are discussing proposed changes are using the available terminology incorrectly, and that's not good.
Students often want to know why this doesn’t work: with open("file") as f: for line in file: do_stuff(line) for line in file: do_other_stuff(line) … when this works fine: with open("file") as f: lines = file.readlines() for line in lines: do_stuff(line) for line in lines: do_other_stuff(line) This question (or a variation on it) gets asked by novices every few day’s on StackOverflow; it’s one of the top common duplicates. The answer is that files are iterators, while lists are… well, there is no word. You can explain it anyway. In fact, you _have_ to give an explanation with analogies and examples and so on, and that would be true even if there were a word for what lists are. But it would be easier to explain if there were such a word, and if you could link that word to something in the glossary, and a chapter in the tutorial.
Still, having clear names with simple definitions would help that problem without watering down the benefits.
I disagree. I agree there's "amortized zero" cost to the crowd who would use those names fairly frequently in design discussions, but there is a cost to the "lazy in the technical sense" programmer, who might want to read the documentation if it gave "simple answers to simple questions", but not if they have to wade through a thicket of "twisty subtle definitions all alike" to get to the simple answer, and especially not if it's not obvious after all that what the answer is.
We shouldn’t define everything up front, just the most important things. But this is one of the most important things. People need to understand this distinction very early on to use Python, and many of them don’t get it, hence all the StackOverflow duplicated. People run into this problem well before they run into a problem that requires them to understand the distinction between arguments and parameters, or protocols and ABCs, or Mapping and dict.
It also makes conversations with experts fraught, as those experts will tend to provide more detail and precision than the questioner wants (speaking for myself, anyway!) "Not every one-sentence explanation needs terminology in the documentation."
I think it’s the opposite. I can teach a child why a glass will break permanently when you hit it while a lake won’t by using the words “solid” and “liquid”. I don’t have to give them the scientific definitions and all the equations. I might not even know them. And in the same way, I can teach novices why the x after x=y+1 doesn’t change when y changes by teaching them about variables without having to explain __getattr__ and fast locals and the import system and so on. Knowing all the subtleties or shear force or __getattribute__ or whatever doesn’t prevent me from teaching a kid without getting into those subtleties. The better I understand “solid” or “variable”, the easier it is for me to teach it. That’s how words work, or how the human mind works, or whatever, and that’s why language is useful for teaching.
But that last thing is exactly the behavior you expect from “things like list, dict, etc.”, and it’s hard to explain, and therefore hard to document.
Um, you just did *explain* it, quite well IMHO, you just didn't *name* it. ;-)
Well, it was a long, and redundant, explanation, not something you’d want to see in the docs or even a PEP.
The part I was referring to was the three or so lines preceding in which you defined the behavior desired for views etc. I guess to define terminology for all the variations that might be relevant would be long (and possibly unavoidably redundant).
Yes, and defining terminology for the one distinction that almost always is relevant helps distinguish that distinction from the other ones that rarely come up. Most people (especially novices) don’t often need to think about the distinction between iterables that are sized and also containers vs. those that are not both sized and containers, so the word for that doesn’t buy us much. But the distinction between iterators and things-like-list-and-so-on comes up earlier, and a lot more often, so a word for that would buy us a lot more.
Isn't manual reset exactly what you want from a resettable iterator, though?
Yes. I certainly use seek(0) on files, and it’s a perfectly cromulent concept, it’s just not the concept I’d want on a range or a keys view or a sequence slice.
But you *don't* use seek(0) on files (which are not iterators, and in fact don't actually exist inside of Python, only names for them do). You use them on opened *file objects* which are iterators.
A file object is a file, in the same way that a list object is a list and an int object is an int. Sure, those are all abstractions, and some are quite vague, and occasionally it’s worth talking specifically about Python’s implementation of the abstraction. An int doesn’t have a storage cost; an int object does. A file doesn’t have a fileno, a file object does. But so what? The fact that we use “file” ambiguously for a bunch of related but contradictory abstractions (a stream that you can read or write, a directory entry, the thing an inode points to, a document that an app is working on, …) makes it a bit more confusing, but unfortunately that ambiguity is forced on people before they even get to their first attempt at programming, so it’s probably too late for Python to help (or hurt).
When you open a file again, by default you get a new iterator which begins at the beginning, as you want for those others. My point is that none of the other types you mention are iterators.
I don’t get what you’re driving at here. Lists, sets, ranges, dict_keys, etc. are not iterators. You can write `for x in xs:` over and over and get the values over and over. Because each time, you get a new iterator over their values. Files, maps, zips, generators, etc. are not like that. They’re iterators. If you write `for x in xs:` twice, you get nothing the second time, because each time you’re using the same iterator, and you’ve already used it up. Because iter(xs) is xs when it’s a file or generator etc.
The difference with files is just that they happen to exist in Python as iterables. But after
_What_ exists in Python as iterables? The only representation of files in Python is file objects—the thing you get back from open (or socket.makefile or io.StringIO or whatever else)—and those are iterators.
r = range(n) ri = iter(range) for i in ri: if i > n_2: break
you want the next "for j in ri:" to start where you left off, no?
Yes. That’s why you called iter, after all. Because doing `for i in r:` twice would _not_ start where you left off. Because a range is not an iterator. But file isn’t like that—you don’t have to call iter on it to get an iterator; in fact, if you write fi=iter(f), fi is the same object as f. Because a file is an iterator. Of course you can also get a new range with r=range(n) again, but you don’t have to, because one range(n) is as good as another. But one range_iter is not as good as another, because there’s no way to use one without using it up. And files aren’t like ranges, they’re like range_iters. Compare these: xs = [x*2 for x in range(10)] ys = (y*2 for y in range(10)) Of course you can sort of iterate over ys twice by just running the same generator expression again to get a brand new object, but that’s not the same thing as iterating over xs twice. That’s not “resetting the iterator”, it’s creating a brand new one. In the same way, you can sort of iterate over a file twice just by running the expression that created it twice, but that’s not resetting the file object, it’s creating a new one. The one difference between files and generators is that you can actually reset the file object by calling seek(0). But that doesn’t make file not an iterator. It just makes file an iterator with an extra feature that most iterators don’t have. If “resettable iterator” means anything useful, it means something like file. Claiming that dict_keys is a “resettable iterator” because you can iterate over it twice is massively confusing, because it’s not an iterator at all, it’s the exact same kind of thing as a list or a range. And I’m pretty sure that’s exactly the confusion that led you to think that dict_keys have weird behavior, and to suggest the same weird behavior for sequence views. Like thinking you can’t have two different iterators over the dict_keys that point to different positions—if it were an iterator, that would be true (notice that it’s true of files—if you call iter on a file twice, they will always have the same position, because they’re both actually the same object as file itself), but because dict_keys is not an iterator, it’s not true.

It took me a good while to "get" the distinction between an itertor and an iterable, and I still misuse those terms sometimes. Maybe because iterable is an awkward word (that my spell checked doesn't recognize)? But it's also because there is a clear definition for "Iterator" in Python, bu the term is used a bit more generally in vague CS nomenclature. The other confusion is that an iterable is not an iterator, but iterators are, in fact, iterables (i.e. you can all iter() on them). I think this is mostly the result of the "for loop" protocol pre-dating the iteration protocol, and wanting to have the same nifty way to iterate everything. That is -- we want to be able to use iterators in for loops, and not have to call iter() in anything before using a for loop. But in fact, I think this is a nice convenience, and mayb one that would be kept in a new language anyway -- it's really handy that you can do A LOT without knowing about iter() and next() and StopIteration, while those tools are stil there when needed. OK -- THAT was a digression. On Wed, May 13, 2020 at 10:52 AM Andrew Barnert <abarnert@yahoo.com> wrote:
On May 12, 2020, at 23:29, Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
A lot of people get this confused. I think the problem is that we don’t have a word for “iterable that’s not an iterator”,
isn't that simply an "Iterable" -- as above, yes, all iterators are iterables, but when we speak of iterators specifically, we are usually referring to the ones that are not an iterator.
It *is* the distinction I'm making with the word "explicit". I never
use "next" on an open file.
nor do I, but there was a conversation on this list a while back, with folks saying that they DID do that. The fact we don't may be because file objects have methods that predate them being iterators. I still don't really think of them as being primarily iterators (and they really aren't for binary files), but objects that happen to have the iteration protocol tacked on for convenience. So I use for loops when it's appropriate, and readline() and the like when it's not.
Students often want to know why this doesn’t work:
with open("file") as f: for line in file: do_stuff(line) for line in file: do_other_stuff(line)
… when this works fine:
with open("file") as f: lines = file.readlines() for line in lines: do_stuff(line) for line in lines: do_other_stuff(line)
This question (or a variation on it) gets asked by novices every few day’s on StackOverflow; it’s one of the top common duplicates.
The answer is that files are iterators, while lists are… well, there is no word.
yes, there is -- they are "lists" :-) -- but if you want to be more general, they are Sequences. And that's actually an important distinction to make i n this case -- the fact that calling readlines() reads the entire file into a list all at once is maybe more important than the fact that is doesn't get "exhausted" by looping through it. My way to teach that is to say that: for line in a_file(): do_something_with(line) is analogous to: while True: a_file.readline() if not line: break do_something_with(line) rather than: for line in a_file.readlines(): do_something_with(line) Or heck, simply say that readlines() reads the whole file at once into a list, and the file object has nothing to do with it anymore. Whereas looping through the lines in a for loop is getting the lines one by one from the file object, so once you've gotten them, all there are no more. Which doesn't require me talking about iterators or iterables, or iter() or next() There is a place to get into all that, but I don't think it needs to be that early in the game. And I've never had a problem with this in my intro classes. Bringing this back to the original topic: I suppose we *could* have a "file_view" object that acted like the list you get from readlines(), but actually called seek() on the underlying file to give you the lines lazily one at a time. That would be, shall we say, problematic, performance wise, but it could be done. You can explain it anyway. In fact, you _have_ to give an explanation with
analogies and examples and so on, and that would be true even if there were a word for what lists are. But it would be easier to explain if there were such a word, and if you could link that word to something in the glossary, and a chapter in the tutorial.
Still not sure why "Sequence" doesn't work here? Granted, there *are* be some "iterables that aren't iterators" that aren't Sequences (like dict views), but they are Iterable Containers, and I think you can talk about them as "views" well enough. Though now that I've written that, maybe we Should have "Iterable" and "Iterator" as ABCs.
I agree there's "amortized zero" cost to the crowd who
would use those names fairly frequently in design discussions, but there is a cost to the "lazy in the technical sense" programmer, who might want to read the documentation if it gave "simple answers to simple questions",
We shouldn’t define everything up front, just the most important things. But this is one of the most important things. People need to understand
Sure, but we can still use the Simple answers, like "Sequence" as above in most cases. this distinction very early on to use Python, and many of them don’t get it, hence all the StackOverflow duplicated. People run into this problem well before they run into a problem that requires them to understand the distinction between arguments and parameters, or protocols and ABCs, or Mapping and dict. That does not match with my experience at all. Yes, maybe the file as iterator example, but that can be explained without getting into the iteration protocol. I've taught intro to Python many years, and never felt the need to clearly define the iteration protocol early in the class. And I have scientist-programmers on my team that are very productive after years that probably don't get it even now. But the distinction between iterators and things-like-list-and-so-on comes
up earlier, and a lot more often, so a word for that would buy us a lot more.
And "iterable" doesn't work?
The one difference between files and generators is that you can actually reset the file object by calling seek(0). But that doesn’t make file not an iterator. It just makes file an iterator with an extra feature that most iterators don’t have.
indeed -- and "resetting" is simply not part of the iterator protocol.
And I’m pretty sure that’s exactly the confusion that led you to think that dict_keys have weird behavior, and to suggest the same weird behavior for sequence views
I'm not sure who "you" is in this sentence, but I think it may be nobody. I think *I* started this "resettable iterator" because I did some iPython experimentation on dict_keys() at 2:00am, and made a stupid mistake, which led me to believe that dict_keys had this weird resetable property. But that was a mistake on my part, and I can't even replicate what I did to give myself that idea. Back to the Sequence View idea, I need to write this up properly, but I'm thinking something like: (using a concrete example or list) list.view is a read-only property that returns an indexable object. indexing that object with a slice returns a list_view object a_view = list.view[a:b:c] a_view is a list_ view object a list_view object is a immutable sequence. indexing it returns elements from the original list. slicing a list view returns ???? I'm not sure what here -- it should probably be a copy, so a new list_view object refgerenceing the same list? That will need to be thought out carefully) calling.view on a list_view is another trick -- does it reference the host view? or go straight back to the original sequence? iter(a_list_view) returns a list_viewiterator. iterating that gets you items from the "host" "on the fly. All this is a fair bit more complicated than my original idea -- which was to not have a full view, but simply an iterator you can get from slice notation. But it would also open up a world of possibilities! -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On May 13, 2020, at 12:40, Christopher Barker <pythonchb@gmail.com> wrote: I hope you don’t mind, but I’m going to take your reply out of order to get the most important stuff first, in case anyone else is still reading. :)
Back to the Sequence View idea, I need to write this up properly, but I'm thinking something like:
(using a concrete example or list)
list.view is a read-only property that returns an indexable object. indexing that object with a slice returns a list_view object
a_view = list.view[a:b:c]
a_view is a list_ view object
a list_view object is a immutable sequence. indexing it returns elements from the original list.
Can we just say that it returns an immutable sequence that blah blah, without defining or naming the type of that sequence? Python doesn’t define the types of most things you never construct directly. (Sometimes there is a public name for it buried away in the types module, but it’s not mentioned anywhere else.) Even the dict view objects, which need a whole docs section to describe them, never say what type they are. And I think this is intentional. For example, nowhere does it say what type function.__get__ returns, only what behavior that object has—and that allowed Python 3 to get rid of unbound methods, because a function already has the right behavior. And nobody even notices that list and tuple use the same type for their __iter__ in some Python implementations but not others. Similarly, I think dict.__iter__() used to return a different type from dict.keys().__iter__() in CPython but now they share a type, and that didn’t break any backward compatibility guarantees. And it seems there’s no reason you couldn’t use the same generic sequence view type on all sequences, but also it’s possible that a custom one for list and tuple might allow some optimization (and even more likely so for range, although it may be less important). So if you don’t specify the type, that can be left up to each version of each implementation to decide.
slicing a list view returns ???? I'm not sure what here -- it should probably be a copy, so a new list_view object refgerenceing the same list? That will need to be thought out carefully)
Good question. I suppose there are three choices: (1) a list (or, in general, whatever the original object returns from slicing), (2) a new view of the same list, or (3) a view of the view of the list. I think I agree with you here that (2) is the best option. In other words, lst.view[2::2][1::3] gives you the exact same thing as lst.view[4::6]. At first that sounds weird because if you can inspect the attributes of the view object, there’s way to see that you did a [1::3] anywhere. But that’s exactly the same thing that happens with, e.g,, range(100)[2::2][1::3]. You just get range(4, 100, 6), and there’s no way to see that you did a [1::3] anywhere. And the same is true for memoryview, and for numpy arrays and bintrees tree slices—despite them being radically different things in lots of other ways, they all made the same choice here. And even beyond Python, it’s what slicing a slice view does in Swift (even though other kinds of views of views don’t “flatten out” like this, slice views of slice views do), and in Go. (Although C++20 is a counterexample here.)
calling.view on a list_view is another trick -- does it reference the host view? or go straight back to the original sequence?
I think it’s the same answer again. In fact, I think .view on any slice view should just return self. Think about it: whether you decided that lst.view[2::2][1::3] gives lst.view[4::6] or a nested view-of-a-view-of-a-list, it would be confusing if lst.view[2::2].view[1::3] gave you the other one, and what other options would make sense? And, unless there’s some other behavior besides slicing on view properties, if self.view slices the same as self, it might as well just be self.
iter(a_list_view) returns a list_viewiterator.
Here, it seems even more useful to leave the type unspecified. For list (and tuple) in CPython, I’m not sure if you can get away with using the special list_iterator type used by list and tuple (which accesses the underlying array directly), or, if not that, the PySeqIter type used for old-style iter-by-indexing, but if you can, it would be both simpler and more efficient. And similarly, range.view might be able to use the range_iterator type. Or, if you can’t do that, a generic PyIter around tp_next would be less efficient than a custom type, but again simpler, and the efficiency might not matter. Or, if you just had a single sequence view type rather than custom ones for each sequence type, that would obviously mean a single iterator type. And so on. That all seems like quality-of-implementation stuff that should be left open to whatever turns out to be best.
iterating that gets you items from the "host" "on the fly.
All this is a fair bit more complicated than my original idea -- which was to not have a full view, but simply an iterator you can get from slice notation.
But it would also open up a world of possibilities!
Yes, in the same way that range (and 2.x xrange) is more complicated but more useful than a hypothetical irange and 3.x dict.keys() (and 2.7 dict.viewkeys()) is more complicated but more useful than 2.6 dict.iterkeys(). I think it’s worth it, but it is a trade off. Now onto the stuff that probably nobody else cares about:
It took me a good while to "get" the distinction between an itertor and an iterable, and I still misuse those terms sometimes.
Maybe because iterable is an awkward word (that my spell checked doesn't recognize)?
My spellchecker is happy with Iterable with a capital I (because it’s seen me type so much Python code?) but complains about iterable with a lowercase i. Or just autocorrects it—sometimes to capital-I Iterable, sometimes to utterable. (Which I wouldn’t think is a word that comes up often enough in anyone’s usage to be a common autocorrect target. Maybe unutterable, but even then only if you’re talking about Lovecraftian horror or religious mysticism.)
But it's also because there is a clear definition for "Iterator" in Python, bu the term is used a bit more generally in vague CS nomenclature.
Yes. And in different languages, too. In C++, iterators are an abstraction of pointers; in OCaml they’re an abstraction of HOFs like map; worst of all, Swift built everything around these three concepts they call “sequence”, “iterator”, and “generator”, clearly aimed at getting the best of both worlds from Python and C++, but all of those concepts mean the wrong thing if you’re coming from either language, and then they changed things between 1.0 and 2.0 just in case anyone wasn’t confused yet.
The other confusion is that an iterable is not an iterator, but iterators are, in fact, iterables (i.e. you can all iter() on them).
Yes. Which is essential to a lot of things about Python’s design, but not essential to the concept at an abstract CS level.
I think this is mostly the result of the "for loop" protocol pre-dating the iteration protocol, and wanting to have the same nifty way to iterate everything. That is -- we want to be able to use iterators in for loops, and not have to call iter() in anything before using a for loop. But in fact, I think this is a nice convenience, and mayb one that would be kept in a new language anyway -- it's really handy that you can do A LOT without knowing about iter() and next() and StopIteration, while those tools are stil there when needed.
I’m not sure about that. There are at least two ways to design a language that doesn’t need both concepts, and both have been tried, even if nobody’s been quite successful yet. The first is the C++ way: just put iterators front and center and make people call iter (or, in their case, begin and end) all over the place. This is pretty easy to understand, and it has some nice advantages (like being able to loop over C strings and arrays without wrapping them). It’s just not actually usable in everyday code unless you start layering a bunch of stuff on top of it, at which point you’ve only avoided the concept of “iterable” by making people learn the concept of “implicitly convertible to iterator range” instead. The second is the Swift way (I’m going to use Python terms rather than Swift ones here to avoid confusion): hide iterators as much as possible. (Java and C# are also gradually moving in this direction, but have a lot more legacy weighing them down.) In Swift, you can’t loop over iterators, or pass them to functions like map—and that’s fine, because functions like map don’t return iterators, they return views. The only place you ever see an iterator in the wild is inside the implementation of a handful of functions like map and zip that really do need to munge iterators manually, and many people will never even read, much less write, such a function. If you do happen to get an iterator somehow and want to use it as an iterable, you have to wrap it in a trivial view object that delegates to it, but this almost never comes up. Sadly, this makes it so much harder to write your stdlib that Apple took three tries (after going public) before they got it right. Some day, someone probably will design a language that doesn’t require most people to learn both concepts and is actually usable. Until then, I’m happy we’ve got Python. :)
Bringing this back to the original topic:
I suppose we *could* have a "file_view" object that acted like the list you get from readlines(), but actually called seek() on the underlying file to give you the lines lazily one at a time. That would be, shall we say, problematic, performance wise, but it could be done.
I remember learning that the way to do this was the nifty new linecache module. Nobody seems to teach that anymore in the 3.x days, but it’s still there, and works as expected for Unicode text and everything. But for something more general, you probably wouldn’t want to bother with a special file view. You can very easily write a generic view that takes _any_ iterator and looks like a sequence, pulling and caching the elements on demand. At a certain point, a lot of people think they want this, then you show them how easy it is to build that, and they think it’s cool—but they never use it again. Caching indices instead of the actual lines seems like a nice optimization, but you’d need a specific use case where the time cost is worth the space savings, and if nobody even uses the generic version, nobody needs to optimize it, right? :) And now on to the stuff that maybe you don’t even care about:
On Wed, May 13, 2020 at 10:52 AM Andrew Barnert <abarnert@yahoo.com> wrote:
On May 12, 2020, at 23:29, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
> A lot of people get this confused. I think the problem is that we > don’t have a word for “iterable that’s not an iterator”,
isn't that simply an "Iterable" -- as above, yes, all iterators are iterables, but when we speak of iterators specifically, we are usually referring to the ones that are not an iterator.
No, we really aren’t. Iterators being iterable is not just a weird quirk that rarely comes up; it’s essential to things you do every day.
The everyday concept behind “iterable” is “something you can use in a for loop”. (You don’t have to get into the technical “something you can call iter on and get an iterator” that often—but when you do, it’s easy to work out that they’re identical concepts anyway.)
The main thing you do with generator expressions, zips, etc. is not call next and check StopIteration, it’s stick them in a for loop (or generator expressions or map or whatever), exactly the same way you use lists and sets and ranges. So if you think of the word “iterable” is a way that doesn’t include generators and zips and so on, you’re just going to confuse yourself.
It *is* the distinction I'm making with the word "explicit". I never
use "next" on an open file.
nor do I, but there was a conversation on this list a while back, with folks saying that they DID do that.
This is your mail agent being a pain again. You’re the one who said that, I quoted you saying it, and now you’re agreeing with yourself. Can we pass a law that anyone who’s worked on any of the major current mail clients is not allowed to work in software anymore? I think that would benefit the world more than any change we can make to Python… Personally, I actually do next files. For example: with open(path) as f: next(f) # skip the first line of the 2-line header for row in csv.DictReader(f): Of course I could have used f.readline() just as well, and I’ve seen as many people do the same thing with readline as with next. It just seems a little more unusual to ignore the result of readline than to ignore the result of next, so when writing it, next feels more natural.
Students often want to know why this doesn’t work:
with open("file") as f: for line in file: do_stuff(line) for line in file: do_other_stuff(line)
… when this works fine:
with open("file") as f: lines = file.readlines() for line in lines: do_stuff(line) for line in lines: do_other_stuff(line)
This question (or a variation on it) gets asked by novices every few day’s on StackOverflow; it’s one of the top common duplicates.
The answer is that files are iterators, while lists are… well, there is no word.
yes, there is -- they are "lists" :-) -- but if you want to be more general, they are Sequences.
But that’s the wrong generalization. Because sets also work the same way, and they aren’t Sequences. Nor are dict views, or many of the other kinds of things-that-can-be-iterated-over-and-over-independently. Plus, this just confuses what Sequences are about. Sequence is a dead simple concept: if seq[0] makes sense, it’s a sequence; if not, it isn’t. (Sure, there’s other stuff crammed in there, like being reversible and in-testable and index-searchable, but all of that stuff is stuff you can obviously and trivially build on top of indexing, so you don’t need to think about it. And there’s the subtlety that 0 is a perfectly cromulent dict key, which unfortunately you do sometimes need to think about, but most of the time you don’t. For the most part, Sequence means you can index it.)
Or heck, simply say that readlines() reads the whole file at once into a list, and the file object has nothing to do with it anymore. Whereas looping through the lines in a for loop is getting the lines one by one from the file object, so once you've gotten them, all there are no more.
Which doesn't require me talking about iterators or iterables, or iter() or next()
Sure, which is great right up until they ask the same question about why they can’t iterate twice over a map or zip. (Which is another very common novice dup on StackOverflow. It’s especially sad when they made a commendable start at debugging things on their own by writing `for pair in pairs: print(pair)`, which instead of rewarding them just made the problem even worse.) Or why they _can_ iterate twice over a range, even though a range clearly isn’t building a whole list in advance. (Especially when they read in some blog that range used to return a list but now it doesn’t. Especially if the person writing that blog misused the word “iterator” in the same way you did earlier, which many of them do.)
You can explain it anyway. In fact, you _have_ to give an explanation with analogies and examples and so on, and that would be true even if there were a word for what lists are. But it would be easier to explain if there were such a word, and if you could link that word to something in the glossary, and a chapter in the tutorial.
Still not sure why "Sequence" doesn't work here? Granted, there *are* be some "iterables that aren't iterators" that aren't Sequences (like dict views), but they are Iterable Containers, and I think you can talk about them as "views" well enough.
Again, surely you don’t want to tell people that sets, dicts, dict views, etc. are Sequences. And if you say, “well, they aren’t Sequences but they are Containers”, that isn’t very helpful—a Container is a thing that supports “in”, which does happen to be true for those types, but it isn’t relevant, so that’s just confusing. The word “view” _is_ great for things-like-dict-keys. That’s why I started off this thread asking for a view instead of an iterator, which I thought would be immediately clear. Unfortunately, it isn’t, or we wouldn’t even be having this discussion.
Though now that I've written that, maybe we Should have "Iterable" and "Iterator" as ABCs.
We already do. And Iterator is a subclass of Iterable, just as it should be. We don’t have an ABC for iterables that give you a new iterator over their contents, that doesn’t use up those contents, every time you iterate them. But that’s not surprising given that we don’t have a word for it. ABCs are named based on either a protocol that already had a name (like Sequence or Coroutine or Rational) or a single method (like Reversible and Hashable), not the other way around. (The only exception I can think of is the ones in io, but they just prove the point—nobody talks about BufferedIOBases as a concept like Sequences or Coroutines, and on the very rare occasions where I need to type-check one, I have to go read the docs to see what I’m supposed to check and what it means to do so.)
But the distinction between iterators and things-like-list-and-so-on comes up earlier, and a lot more often, so a word for that would buy us a lot more.
And "iterable" doesn't work?
No, it doesn’t. You can’t use “iterable” to mean things like lists and sets but not generators and files, because iterators are every bit as iterable. This would be like saying you can just use “animal” for things like dogs and people but not frogs and birds, or “number” for things like 1/4 and -3/17 but not e and pi, or “Christian” for people like Lutherans and Methodists but not Catholics and Orthodox, etc. We have words for concepts like “mammal” and “rational” and “Protestant”, because you can’t just say “animal” and “number” and “Christian” or you’re being confusing.

On Wed, May 13, 2020 at 7:50 PM Andrew Barnert <abarnert@yahoo.com> wrote:
On May 13, 2020, at 12:40, Christopher Barker <pythonchb@gmail.com> wrote:
I hope you don’t mind, but I’m going to take your reply out of order to get the most important stuff first, in case anyone else is still reading. :)
And I'm going to keep this reply to one topic ...
Back to the Sequence View idea, I need to write this up properly, but I'm
thinking something like:
Can we just say that it returns an immutable sequence that blah blah, without defining or naming the type of that sequence?
Sure -- but it ends up getting a lot more wordy if you dont' have a name for a thing. Python doesn’t define the types of most things you never construct directly.
No, but there are ABCs so that might e the way to talk about this. (Sometimes there is a public name for it buried away in the types module,
but it’s not mentioned anywhere else.) Even the dict view objects, which need a whole docs section to describe them, never say what type they are.
fair enough.
And nobody even notices that list and tuple use the same type for their __iter__ in some Python implementations but not others.
I sure haven't noticed that :-)
Similarly, I think dict.__iter__() used to return a different type from dict.keys().__iter__() in CPython but now they share a type, and that didn’t break any backward compatibility guarantees.
And it seems there’s no reason you couldn’t use the same generic sequence view type on all sequences,
Indeed, I was thinking that would be the way to prototype it anyway -- but yes, of course you would want to be able to write custom, optimized versions on some cases. Anyway -- yes, of course, this should all be Duck Typed / Protocol / API based, not any particular type. It's just hard to tak about that way -- kind of like the awkward "file-like object".
slicing a list view returns ???? I'm not sure what here -- it should probably be a copy, so a new list_view object refgerenceing the same list? That will need to be thought out carefully)
Good question. I suppose there are three choices: (1) a list (or, in general, whatever the original object returns from slicing), (2) a new view of the same list, or (3) a view of the view of the list.
I think I agree with you here that (2) is the best option. In other words, lst.view[2::2][1::3] gives you the exact same thing as lst.view[4::6].
At first that sounds weird because if you can inspect the attributes of the view object, there’s way to see that you did a [1::3] anywhere.
But that’s exactly the same thing that happens with, e.g,, range(100)[2::2][1::3]. You just get range(4, 100, 6), and there’s no way to see that you did a [1::3] anywhere.
I have NEVER thought to slice a range -- but it is kind of cool.
calling.view on a list_view is another trick -- does it reference the host view? or go straight back to the original sequence?
I think it’s the same answer again. In fact, I think .view on any slice view should just return self.
Hmm -- this makes me nervous, but as long as its immutable, why not? Think about it: whether you decided that lst.view[2::2][1::3] gives
lst.view[4::6] or a nested view-of-a-view-of-a-list, it would be confusing if lst.view[2::2].view[1::3] gave you the other one, and what other options would make sense? And, unless there’s some other behavior besides slicing on view properties, if self.view slices the same as self, it might as well just be self.
exactly.
iter(a_list_view) returns a list_viewiterator.
Here, it seems even more useful to leave the type unspecified. For list (and tuple) in CPython,
I agree -- I was using that name for convenience of talking about it, but that probably simply adds more confusion.
... That all seems like quality-of-implementation stuff that should be left open to whatever turns out to be best.
Exactly. See another reply for the " stuff that probably nobody else cares about: " -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On May 13, 2020, at 20:32, Christopher Barker <pythonchb@gmail.com> wrote:
On Wed, May 13, 2020 at 7:50 PM Andrew Barnert <abarnert@yahoo.com> wrote: On May 13, 2020, at 12:40, Christopher Barker <pythonchb@gmail.com> wrote:
Back to the Sequence View idea, I need to write this up properly, but I'm thinking something like:
Can we just say that it returns an immutable sequence that blah blah, without defining or naming the type of that sequence?
Sure -- but it ends up getting a lot more wordy if you dont' have a name for a thing.
You’re right. Looking at the dict and similar docs, what they mostly do is to talk about”the key view”, and sometimes even “the key view type”, etc., in plain English, while being careful not to say anything that implies it has any particular name or identity. (In particular, “key view type” obviously can’t be the name of an actual type, because it has a space in it.) Anyway, if the proposal gets far enough to need docstrings and documentation, I guess you can worry about getting it right then, but until then you don’t have to be that careful; as long as we all know that list_view isn’t meant to name a specific type (and to be guaranteed distinct from tuple_view), I think we’ll all be fine.
Python doesn’t define the types of most things you never construct directly.
No, but there are ABCs so that might e the way to talk about this.
That’s a good point. Does a sequence slice view (or a more general sequence view?) need an ABC beyond just being a Sequence? I wasn’t expecting that to be needed, but now that you bring it up… if there’s, say, a public attribute/property or method to get the underlying object, presumably it should be the same name on all such views, and maybe that’s something you’d want to be documented, and maybe even testable, by an ABC after all.
And nobody even notices that list and tuple use the same type for their __iter__ in some Python implementations but not others.
I sure haven't noticed that :-)
It’s actually a bit surprising what tuple and list share under the covers in CPython, even at the public C API level.
calling.view on a list_view is another trick -- does it reference the host view? or go straight back to the original sequence?
I think it’s the same answer again. In fact, I think .view on any slice view should just return self.
Hmm -- this makes me nervous, but as long as its immutable, why not?
Exactly. The same as these: >>> s = ''.join(random.choices(string.ascii_lowercase, k=10)) >>> s[:] is s True >>> str.__new__(s) is s True >>> copy.copy(s) is s True >>> t = tuple(s) >>> t[:] is t True etc. But all of those are just allowed, and implemented that way by CPython, not guaranteed by the language. So maybe the same should be true here. You can implement .view as just self, but if other implementations want to do something different they can, as long as it meets the same documented behavior (which could just be something like “view-slicing a view slice has the same effect as slicing a view slice” or something?).

OK, now for: On Wed, May 13, 2020 at 7:50 PM Andrew Barnert <abarnert@yahoo.com> wrote:
Now onto the stuff that probably nobody else cares about:This is your mail agent being a pain again. You’re the one who said that, I quoted you saying it, and now you’re agreeing with yourself.
Of COURSE I agree ith myself :-) -- but I made that mistake partially because there was a big thread about this a while back -- and I wasn't the only one that said that.
Students often want to know why this doesn’t work:
with open("file") as f: for line in file: do_stuff(line) for line in file: do_other_stuff(line)
… when this works fine:
with open("file") as f: lines = file.readlines() for line in lines: do_stuff(line) for line in lines: do_other_stuff(line)
This question (or a variation on it) gets asked by novices every few day’s on StackOverflow; it’s one of the top common duplicates.
The answer is that files are iterators, while lists are… well, there is no word.
yes, there is -- they are "lists" :-) -- but if you want to be more general, they are Sequences. But that’s the wrong generalization. Because sets also work the same way, and they aren’t Sequences. Nor are dict views, or many of the other kinds of things-that-can-be-iterated-over-and-over-independently. But file.readline() does not return any of those objects. It returns a list. If you see this as an opportunity to teach about the iteration protocol, then sure, you'd want to make that distinction. But I think the file object is the wrong first example -- it's an oddball, having both the iteration protocol, AND methods for doing most of teh same things. Most iterables don't have the equivalent of readlines() or readline() -- and in this case, I think THAT's the main sorce of confusion, rather than the iterable vs iterator distinction. And you sure would want to make sure they "get" that readlines() is "greedy" -- it's like calling list() on an iterable. But why introduce that now? <snip>
Which doesn't require me talking about iterators or iterables, or iter() or next()
Sure, which is great right up until they ask the same question about why they can’t iterate twice over a map or zip.
Then THAT's the time to get into the iteration protocol -- map and zip are much more "clean" examples.
You can explain it anyway. In fact, you _have_ to give an explanation with analogies and examples and so on, and that would be true even if there were a word for what lists are. But it would be easier to explain if there were such a word, and if you could link that word to something in the glossary, and a chapter in the tutorial.
The word “view” _is_ great for things-like-dict-keys. That’s why I started off this thread asking for a view instead of an iterator, which I
OK -- time for someone to come up with word for "Itererable that isn't an Iterator" -- I"d start using it :-) thought would be immediately clear. Unfortunately, it isn’t, or we wouldn’t even be having this discussion. well it's as clear as maybe anything else -- at least after talking about it a while ... - CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On May 13, 2020, at 20:49, Christopher Barker <pythonchb@gmail.com> wrote:
OK, now for:
On Wed, May 13, 2020 at 7:50 PM Andrew Barnert <abarnert@yahoo.com> wrote:
But that’s the wrong generalization. Because sets also work the same way, and they aren’t Sequences. Nor are dict views, or many of the other kinds of things-that-can-be-iterated-over-and-over-independently.
But file.readline() does not return any of those objects. It returns a list. If you see this as an opportunity to teach about the iteration protocol, then sure, you'd want to make that distinction. But I think the file object is the wrong first example -- it's an oddball, having both the iteration protocol, AND methods for doing most of teh same things.
Agreed, it’s not an ideal first example, and zip or map would be much better. Unfortunately, files seem to be the example that many people run into first. (Or maybe lots of people do run into map first, but fewer of them get confused and need to go ask for help?) When you’re teaching a class, you can guide people to hit the things you want them to think about, but the intern, or C# guru who only touches Python once a year, or random person on StackOverflow that I’m dealing with apparently didn’t take your class. This is where they got confused, so this is what they ask about.
Most iterables don't have the equivalent of readlines() or readline() -- and in this case, I think THAT's the main sorce of confusion, rather than the iterable vs iterator distinction.
But notice that they’re already writing `for line in f:`. That means they *do* understand that files are iterables. Sure, they probably don’t know the word “iterable”, but they understand that files are things you can use in a for loop (and that’s all “iterable” really means, unless you’re trying to implement rather than use them). And honestly, if Python didn’t make iteration so central, I’m not sure as many novices would get that far that fast in the first place. Imagine if, instead of just calling open and then doing `for line in f:`, you had to call an opener factory to get a filesystem opener, call that to get a file object, bind a line-buffered read stream to it, then call a method on that read stream with a callback function that processes the line and makes the next async read call. Anyone who gets that far is probably already a lot more experienced with JavaScript than someone who iterates their first file is with Python.
You can explain it anyway. In fact, you _have_ to give an explanation with analogies and examples and so on, and that would be true even if there were a word for what lists are. But it would be easier to explain if there were such a word, and if you could link that word to something in the glossary, and a chapter in the tutorial.
OK -- time for someone to come up with word for "Itererable that isn't an Iterator" -- I"d start using it :-)
People used to loosely use “collection” for this, back before it was defined to mean “sized container”, but that no longer works. Maybe we need to come up with a word that can’t possibly have any existing meaning to anyone, like Standard Oil did with “Exxon”.

Andrew Barnert writes:
Maybe we need to come up with a word that can’t possibly have any existing meaning to anyone, like Standard Oil did with “Exxon”.
Unfortunately, I doubt such a word can be trademarked, so somebody else will borrow it with a slightly different meaning that becomes more popular (or popular with a crowd some of whom then turn to Python). But don't let the fact that the odds favor the house stop you, you can't win if you don't play! Steve

Andrew Barnert writes:
And I’m pretty sure that’s exactly the confusion that led you to think that dict_keys have weird behavior,
That wasn't me .... I'm here to discuss documentation, not dict or sequence views. ;-) Changing the subject field to match.
Students often want to know why this doesn’t work:
with open("file") as f: for line in file: do_stuff(line) for line in file: do_other_stuff(line)
Sure. *Some* students do. I've never gotten that question from mine, though I do occasionally see with open("file") as f: for line in f: # ;-) do_stuff(line) with open("file") as f: for line in f: do_other_stuff(line) I don't know, maybe they asked the student next to them. :-)
The answer is that files are iterators, while lists are… well, there is no word.
As Chris B said, sure there are words: File objects are *already* iterators, while lists are *not*. My question is, "why isn't that instructive?"
We shouldn’t define everything up front, just the most important things. But this is one of the most important things. People need to understand this distinction very early on to use Python,
No, they don't. They neither understand, nor (to a large extent) do they *need* to. We cannot solve the problem of "lazy in the technical sense" programming by improving Python. It's a matter of optimizing programmer effort. If cargo culting and asking on Stack Overflow and bitching on Twitter or your personal blog when software doesn't DWIM is psychologically (and frequently time management-ly) cheaper than learning How Things Work, that's what people are going to do. I can't tell them they're wrong (except my own students, and they mostly ignore me until they run out of options other than listening to me :-). ISTM that all we need to say is that 1. An *iterator* is a Python object whose only necessary function is to return an object when next is applied to it. Its purpose is to keep track of "next" for *for*. (It might do other useful things for the user, eg, file objects.) 2. The *for* statement and the *next* builtin require an iterator object to work. Since for *always* needs an iterator object, it automatically converts the "in" object to an iterator implicitly. (Technical note: for the convenience of implementors of 'for', when iter is applied to an iterator, it always returns the iterator itself.) 3. When a "generic" iterator "runs out", it's exhausted, it's truly done. It is no longer useful, and there's nothing you can do but throw it away. Generic iterators do not have a reset method. Specialized iterators may provide one, but most do not. 4. Objects that can be converted to iterators are *iterables*. Trivially, iterators are iterable (see technical note supra). 5. Most Python objects are not iterators, but many can be converted. However, some Python objects are constructed as iterators because they want to be "lazy". Examples are files (so that a huge file can be processed line by line without reading the whole thing into memory) and "generators" which yield a new item each time they are called. But AFAIK we *do* say that, and it doesn't get through.
I can teach a child why a glass will break permanently when you hit it while a lake won’t by using the words “solid” and “liquid”.
Terrible example, since a glass is just a geologically slow liquid. ;-) Back to the discussion: the child can touch both, and does so frequently (assuming you don't feed them from the dog's bowl and also bathe them regularly). They've seen glasses break, most likely, and splashed water. Iterators have one overriding purpose: to be fed to *for* statements, be exhausted, and then discarded. This is so important that it's done implicitly and in every single *for* statement. We have the necessary word, "iterator," but students don't have the necessary experience of "touching" the iterator that *for* actually iterates over instead of the list that is explicit in the *for* statement. That iterator is created implicitly and becomes garbage as soon as the *for* statement. And there's no way for the student to touch it, it doesn't have a name! If you want to fix nomenclature, don't call them "files," don't call them "file objects," call them "file iterators". Then students have an everyday iterator they can touch. I'll guarantee that causes other problems, though, and gets a ton of resistence. Even from me. :-)
Yes, and defining terminology for the one distinction that almost always is relevant helps distinguish that distinction from the other ones that rarely come up. Most people (especially novices) don’t often need to think about the distinction between iterables that are sized and also containers vs. those that are not both sized and containers, so the word for that doesn’t buy us much. But the distinction between iterators and things-like-list-and-so-on comes up earlier, and a lot more often, so a word for that would buy us a lot more.
We have that word and distinction. A file object *is* an iterator. A list is *not* an iterator. *for* works *with* iterators internally, and *on* iterables through the magic of __iter__.
But you *don't* use seek(0) on files (which are not iterators, and in fact don't actually exist inside of Python, only names for them do). You use them on opened *file objects* which are iterators.
A file object is a file, in the same way that a list object is a list and an int object is an int.
No, it's not the same: your level of abstraction is so high that you've lost sight of the iterable/iterator distinction. All of the latter objects own their own data in a way that a file object does not. All of the latter objects are different from their iterators (where such iterators exist), while the file object is not.
The fact that we use “file” ambiguously for a bunch of related but contradictory abstractions (a stream that you can read or write, a directory entry, the thing an inode points to, a document that an app is working on, …) makes it a bit more confusing, but unfortunately that ambiguity is forced on people before they even get to their first attempt at programming, so it’s probably too late for Python to help (or hurt).
Agreed. I would be much happier if we could discuss an example that is *not* iterating over files but *does* come up every day on StackOverflow. Maybe zips would work but I'm not sure the motivation comes together the way it does for files (why do zips want to be lazy? what are the compelling examples for zip of "restarting the iteration where you left off" with a new *for* statement?)
When you open a file again, by default you get a new iterator which begins at the beginning, as you want for those others. My point is that none of the other types you mention are iterators.
I don’t get what you’re driving at here.
Simply that we have the necessary distinction already: iterators vs. everything else. IMO the problem is that the students have zero or very little experience of iterators other than files, and so think of file objects as weird iterables, rather than as iterators.
Lists, sets, ranges, dict_keys, etc. are not iterators. You can write `for x in xs:` over and over and get the values over and over. Because each time, you get a new iterator over their values.
You and I know that, because we know what an iterator is, and we know it's there because it has to be: *for* doesn't iterate anything but an iterator. But (except via a bytecode-level debugger) nobody has ever seen that iterator. You can use iter to get a similar iterator, of course, but it's not the same object that any for statement ever used. (Unless you explicitly created it with iter, but then you can re-run the for statement on it the way you do with a list.)
The difference with files is just that they happen to exist in Python as iterables. But after
_What_ exists in Python as iterables?
Lists, tuples, sets, dicts, and other containers.
Files, maps, zips, generators, etc. are not like that. They’re iterators. If you write `for x in xs:` twice, you get nothing the second time, because each time you’re using the same iterator, and you’ve already used it up. Because iter(xs) is xs when it’s a file or generator etc.
Genexps are iterators, but generators (in the sense of the product of a def that contains "yield") are not even iterable. Those are iterator factories.
The only representation of files in Python is file objects—the thing you get back from open (or socket.makefile or io.StringIO or whatever else)—and those are iterators.
The thought occurred to me, "What if that was a bad decision? Maybe in principle files shouldn't be iterators, but rather iterables with a real __iter__ that creates the iterable." I realized that I'd already answered my own question in part: I find it easy to imagine cases where I'd want to get some lines of input from a file as a higher-level unit, then stop and do some processing. The killer app for me is mbox files. Another plausible case is reading top-level Lisp expressions from a file (although that doesn't necessarily divide neatly into lines.) I also found it surprisingly complicated to think about the consequences to the type of making that change. Going back to the documentation theme, maybe one way to approach explaining iterators is to start with the use case of files as (non-seekable) streams, show how 'for iteration' can be "restarted" where you left off in the file, and teach that "this is the canonical behavior of iterators; lists etc are *iterable* because 'for' automatically converts them to iterators "behind the scenes". If sockets or pipes were more familiar to beginning programmers, they might be better examples, but I think that files-as-streams might be the most familiar and approachable, though real files are far more flexible than just unseekable streams. I'll try to take a look at the "official" tutorials and language documentation "sometime soon" and see if maybe this idea could be applied to improve them. Steve

On Fri, May 15, 2020 at 1:19 PM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
ISTM that all we need to say is that
1. An *iterator* is a Python object whose only necessary function is to return an object when next is applied to it. Its purpose is to keep track of "next" for *for*. (It might do other useful things for the user, eg, file objects.)
2. The *for* statement and the *next* builtin require an iterator object to work. Since for *always* needs an iterator object, it automatically converts the "in" object to an iterator implicitly. (Technical note: for the convenience of implementors of 'for', when iter is applied to an iterator, it always returns the iterator itself.)
That's not a mere technical detail - that's actually part of the definition of an iterator, namely that iter(x) is x. That's how you can tell that it's an iterator.
4. Objects that can be converted to iterators are *iterables*. Trivially, iterators are iterable (see technical note supra).
5. Most Python objects are not iterators, but many can be converted. However, some Python objects are constructed as iterators because they want to be "lazy". Examples are files (so that a huge file can be processed line by line without reading the whole thing into memory) and "generators" which yield a new item each time they are called.
I don't like this term "converted". It's very frequently used to describe the construction of an integer based on the digits in a string, for instance, and in that case it's at least useful, but it's never really correct. There's nothing being converted anywhere. Getting an iterator from an iterable isn't converting anything. (Part of the reason I don't like int(txt) to be called a conversion is that it sounds like a mutation operation. If you have a variable that currently contains "1234", calling int(x) is not going to change what that variable contains. Seen way too many students get this wrong.) ChrisA

Chris Angelico writes:
(Technical note: for the convenience of implementors of 'for', when iter is applied to an iterator, it always returns the iterator itself.)
That's not a mere technical detail - that's actually part of the definition of an iterator, namely that iter(x) is x. That's how you can tell that it's an iterator.
From the point of view of teaching iterators to novices, I think it *is* a technical detail. As has been pointed out, there are languages where iterators are *never* iterable. What's *necessary* to an iterator as a concept is that it have a __next__. Python chooses to define the iterator protocol with __iter__ being the identity for iterators because it makes implementing *for* straightforward.
I don't like this term "converted".
I refuse to die on that hill. :-) Suggest a better term, I'll happily use it until something even better comes along. Or I'll try to come up with a better one as I think about the documentation issue. Steve

On Fri, May 15, 2020 at 7:23 PM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Chris Angelico writes:
(Technical note: for the convenience of implementors of 'for', when iter is applied to an iterator, it always returns the iterator itself.)
That's not a mere technical detail - that's actually part of the definition of an iterator, namely that iter(x) is x. That's how you can tell that it's an iterator.
From the point of view of teaching iterators to novices, I think it *is* a technical detail. As has been pointed out, there are languages where iterators are *never* iterable. What's *necessary* to an iterator as a concept is that it have a __next__. Python chooses to define the iterator protocol with __iter__ being the identity for iterators because it makes implementing *for* straightforward.
Fair enough. Doesn't make a lot of difference, though.
I don't like this term "converted".
I refuse to die on that hill. :-) Suggest a better term, I'll happily use it until something even better comes along. Or I'll try to come up with a better one as I think about the documentation issue.
Unfortunately I don't have a really good generic term, but I would be inclined to "get an iterator from" an object rather than "convert" it to an iterator. It's still not a great term, but at least it allows you to think about getting multiple iterators from the same thing, even potentially getting different types of iterator. ChrisA

On Fri, May 15, 2020 at 5:37 AM Chris Angelico <rosuav@gmail.com> wrote:
On Fri, May 15, 2020 at 7:23 PM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Chris Angelico writes:
I don't like this term "converted".
I refuse to die on that hill. :-) Suggest a better term, I'll happily use it until something even better comes along. Or I'll try to come up with a better one as I think about the documentation issue.
Unfortunately I don't have a really good generic term, but I would be inclined to "get an iterator from" an object rather than "convert" it to an iterator. It's still not a great term, but at least it allows you to think about getting multiple iterators from the same thing, even potentially getting different types of iterator.
Perhaps use the iter function name as the generic? "itered". As opposed to "iterated" or "iterated over". Example: "the statement below iterates over an iterator, itered from a sequence"

On Fri, May 15, 2020 at 05:58:16AM -0400, Ricky Teachey wrote:
Perhaps use the iter function name as the generic? "itered". As opposed to "iterated" or "iterated over".
Example:
"the statement below iterates over an iterator, itered from a sequence"
Or just avoid the issue: "The statement below iterates over a sequence" which is perfectly valid and correct. If we do feel the need to drill down into pedantic technical details, instead of making up ugly words that nobody will have any clue whatsoever what the meaning is[1], we could use one of many existing English words: built from formed from constructed from made from fabricated from created from put together from etc. And notice I avoided using terms which imply that the sequence itself is transformed into an iterator, such as "converting into". [1] "Iter" is an old term for a passage, in particular an anatomical term for a passage in the brain, so "itered" would be the past tense of a verb to turn something into a passage. -- Steven

I think maybe some of the trouble here, particularly in teaching is the word "is" (in English, not the Python keyword). As in: "A file object IS and iterator" and "A zip object IS an iterator" I know in OO parlance, "is a" can be used to designate subclassing (or an appropriate use for it) and it can be made a bit more generic to mean "has the interface of" -- which is how we are using it here. But in common language, there might be a bit of confusion: something is an iterator among other things vs something is PRIMARILY an iterator File objects are a good example: A file object provides all sorts of things, and iteration is only a small part of it. In fact, maybe due to my long history with working with pre-iterator file objects (python 1.5! -- yes, I am that old) I see them as fully featured objects that also happen to provide an iteration interface to line-oriented access to text files as a convenience -- calling them "an iterator" feels a bit odd. Whereas things like the zip iterator, and many of the things in itertools, do primarily (only) iteration -- calling them "an iterator" makes a huge amount of sense. Even more so for things like list_iter, that are generally hidden from the user, and exist only to provide iteration. In fact, that's why I think file objects may be one of the worst ways to introduce the iteration protocol to newbies. (though maybe because of the confusion, it's a good example of why it matters to understand the distinctions) Iterables, on the other hand, usually provide a bunch of functionally other than iterability -- and indeed, most are fully functional without the iterator protocol at all. (maybe with ugly code :-) ) -CHB On Fri, May 15, 2020 at 6:14 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, May 15, 2020 at 05:58:16AM -0400, Ricky Teachey wrote:
Perhaps use the iter function name as the generic? "itered". As opposed to "iterated" or "iterated over".
Example:
"the statement below iterates over an iterator, itered from a sequence"
Or just avoid the issue:
"The statement below iterates over a sequence"
which is perfectly valid and correct.
If we do feel the need to drill down into pedantic technical details, instead of making up ugly words that nobody will have any clue whatsoever what the meaning is[1], we could use one of many existing English words:
built from formed from constructed from made from fabricated from created from put together from
etc. And notice I avoided using terms which imply that the sequence itself is transformed into an iterator, such as "converting into".
[1] "Iter" is an old term for a passage, in particular an anatomical term for a passage in the brain, so "itered" would be the past tense of a verb to turn something into a passage.
-- Steven _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/CQM25A... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Steven D'Aprano writes:
constructed from
Thank you! This is what I will use. "Construction" in programming has the strong connotation of returning a new object (even though, say, int("1") may return an interned object, so not new in a sense). I'm not sure whether I'll rely on that looseness in describing the "iterator constructed from an iterator" or (more likely) treat iterators as a special case, and put "construction" in scare quotes if the connotation that an object is, or might be, an iterator is strong. Comments?[1] Steve Footnotes: [1] Please don't bother telling me not to use scare quotes that way because many people are EASL or not well-educated. That is a hill I'm willing to die on. ;-) Scare quotes a FAQ in my EASL classes (not just because I use them), so learning to read them is a real need IME.

On Fri, May 15, 2020 at 12:17:56PM +0900, Stephen J. Turnbull wrote:
Terrible example, since a glass is just a geologically slow liquid. ;-)
That's a myth :-) The key test for a liquid is not whether is flows, since solids also flow. (In solids, this usually happens very, very slowly, and is usually called "creep".) The key test is whether it can *drip*, and glass does not drip (except when molten). Glass is considered a non-crystaline or amorphous solid. https://en.wikipedia.org/wiki/Amorphous_solid Pitch, on the other hand, is a geologically slow liquid :-) https://smp.uq.edu.au/pitch-drop-experiment Arguments over the precise definition of states of matter are, to some degree, futile. I've seen amorphous solids described as "liquids that don't flow" and non-Newtonian liquids described as "solids that flow". -- Steven

On 16/05/20 12:26 am, Steven D'Aprano wrote:
Arguments over the precise definition of states of matter are, to some degree, futile. I've seen amorphous solids described as "liquids that don't flow" and non-Newtonian liquids described as "solids that flow".
I think this just shows that nature doesn't always agree to fit into the neat categories we would like it to! -- Greg

Here's an article about recent research:

Oops, try again; Here's an article about recent research. I found it fascinating. https://www.quantamagazine.org/ideal-glass-would-explain-why-glass-exists-at... It starts: Glass is anything that’s rigid like a crystal, yet made of disordered molecules like a liquid. To understand why it exists, researchers are attempting to create the perfect, still-hypothetical “ideal glass.” -- Jonathan

On Thu, May 14, 2020, 11:20 PM Stephen J. Turnbull
I can teach a child why a glass will break permanently when you hit it while a lake won’t by using the words “solid” and “liquid”.
Terrible example, since a glass is just a geologically slow liquid. ;-)
It isn't though. I used to believe the old urban legend about glass being a (very slow) liquid too. It makes for good "impress your friends" chatter at a certain point. But it's not true, glass is an amorphous solid, and old windows are thicker at the bottom because they hung them that way. https://www.scientificamerican.com/article/fact-fiction-glass-liquid/

On May 14, 2020, at 20:17, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Andrew Barnert writes:
Students often want to know why this doesn’t work: with open("file") as f: for line in file: do_stuff(line) for line in file: do_other_stuff(line)
Sure. *Some* students do. I've never gotten that question from mine, though I do occasionally see
with open("file") as f: for line in f: # ;-) do_stuff(line) with open("file") as f: for line in f: do_other_stuff(line)
I don't know, maybe they asked the student next to them. :-)
Or they got it off StackOverflow or Python-list or Quora or wherever. Those resources really do occasionally work as intended, providing answers to people who search without them having to ask a duplicate question. :)
The answer is that files are iterators, while lists are… well, there is no word.
As Chris B said, sure there are words: File objects are *already* iterators, while lists are *not*. My question is, "why isn't that instructive?"
Well, it’s not _completely_ not instructive, it’s just not _sufficiently_ instructive. Language is more useful when the concepts it names carve up the world in the same way you usually think about it. Yes, it’s true that we can talk about “iterables that are not iterators”. But that doesn’t mean there’s no need for a word. We don’t technically need the word “liquid” because we could always talk about “compressibles that are not solid” (or “fluids that are not gas”); we don’t need the word “bird” because we could always talk about “diapsids that are not reptiles”; etc. Theoretically, English could express all the same propositions and questions and so on that it does today without those words. But practically, it would be harder to communicate with. And that’s why we have the words “bird” and “liquid”. And the reason we don’t have a word for all diapsids except birds and turtles is that we don’t need to communicate about that category. Natural languages get there naturally; jargon sometimes needs help.
We shouldn’t define everything up front, just the most important things. But this is one of the most important things. People need to understand this distinction very early on to use Python,
No, they don't. They neither understand, nor (to a large extent) do they *need* to.
ISTM that all we need to say is that
1. An *iterator* is a Python object whose only necessary function is to return an object when next is applied to it. Its purpose is to keep track of "next" for *for*. (It might do other useful things for the user, eg, file objects.)
2. The *for* statement and the *next* builtin require an iterator object to work. Since for *always* needs an iterator object, it automatically converts the "in" object to an iterator implicitly. (Technical note: for the convenience of implementors of 'for', when iter is applied to an iterator, it always returns the iterator itself.)
I think this is more complicated than people need to know, or usually learn. People use for loops almost from the start, but many people get by with never calling next. All you need is the concept “thing that can be used in a for loop”, which we call “iterable”. Once you know that, everything else in Python that loops is the same as a for loop—the inputs to zip and enumerate are iterables, because they get looped over. “Iterable” is the fundamental concept. Yeah, it sucks that it has such a clumsy word, but at least it has a word. You don’t need the concept “iterator” here, much less need to know that looping uses iterables by calling iter() to get an iterator and then calling next() until StopIteration, until you get to the point of needing to read or write some code that iterates manually. Of course you will need to learn the concept “iterator” pretty soon anyway, but only because Python actually gives you iterators all over the place. In a language (like Swift) where zip and enumerate were views, files weren’t iterable at all, etc., you wouldn’t need the concept “iterator” until very late, but in Python it shows up early. But you still don’t need to learn about next(); that’s as much a technical detail as the fact that they return self from iter(). You want to know whether they can be used in for loops—and they can, because (unlike in Swift) iterators are iterable, and you already understand that.
3. When a "generic" iterator "runs out", it's exhausted, it's truly done. It is no longer useful, and there's nothing you can do but throw it away. Generic iterators do not have a reset method. Specialized iterators may provide one, but most do not.
Yes, this is the next thing you need to know about iterators. But you also need to know that many iterables don’t get consumed in this way. Lists, ranges, dicts, etc. do _not_ run out when you use them in a for loop. There’s a wide range of things you use every day that can be looped over repeatedly. And they all act the same way—each time you loop over them, you get all of their contents, from start to finish. That isn’t part of the Iterable protocol, or the concept underneath it. It can’t be, because it’s not true for some common iterables, like all iterators. People try to guess at what that concept is, and that’s where they run into problems. Because:
5. Most Python objects are not iterators, but many can be converted. However, some Python objects are constructed as iterators because they want to be "lazy". Examples are files (so that a huge file can be processed line by line without reading the whole thing into memory) and "generators" which yield a new item each time they are called.
But AFAIK we *do* say that, and it doesn't get through.
I think many people do get this, and that’s exactly what leads to confusion. They think that “lazy” and “iterator” (or “consumed when you loop over it”) go together. But they don’t. If you learned that “some Python objects are constructed as iterators because they want to be lazy”, and you know ranges are lazy, you’re liable to think that ranges are consumed when you loop over them, and if they know the term “iterator”, they’ll apply it to ranges (as so many people do—even people writing blog posts and StackOverflow answers). And if you think of files as _not_ lazy—because, after all, the lines do exist in advance on disk—then you expect them to be reusable in for loops, just like lists and dicts. (If you think about socket.makefile() or open('/dev/random') that would probably disabuse you of the notion, but how many novices are using those files?) You could explain this by further refining the concept of “lazy” to explain that files are lazy in the sense of processing, or heap usage, or something, not just ontological existence or whatever. But that’s pretty complicated. And it’s ultimately misleading, because it still gives people the wrong answer for ranges.
I can teach a child why a glass will break permanently when you hit it while a lake won’t by using the words “solid” and “liquid”.
Terrible example, since a glass is just a geologically slow liquid. ;-)
No, a glass is a solid. It doesn’t flow (except in the very loose sense that all solids do). And even if that factoid weren’t false, it would be a fact about physicists’ jargon, not about the everyday words. If I ask you to bring a fruit salad to the potluck and you show up with tomatoes, peas, peanuts, wheat grains, and eggplants but no strawberries, nobody is going to be impressed.
Back to the discussion: the child can touch both, and does so frequently (assuming you don't feed them from the dog's bowl and also bathe them regularly). They've seen glasses break, most likely, and splashed water.
And someone learning Python does get to touch both things here. They get lists, dicts, and ranges, and they get files, zips, and enumerate. Both categories come up pretty early in learning Python, just like both solids and liquids come up pretty early in learning to be human.
Iterators have one overriding purpose: to be fed to *for* statements, be exhausted, and then discarded. This is so important that it's done implicitly and in every single *for* statement. We have the necessary word, "iterator," but students don't have the necessary experience of "touching" the iterator that *for* actually iterates over instead of the list that is explicit in the *for* statement. That iterator is created implicitly and becomes garbage as soon as the *for* statement. And there's no way for the student to touch it, it doesn't have a name!
No, it’s iterables whose purpose is being fed to a for statement. Yes, iterators are what for statements use under the covers to deal with iterables, but you don’t need to learn that until well after you’ve learned that iterators are what you get from open and zip.
If you want to fix nomenclature, don't call them "files," don't call them "file objects," call them "file iterators". Then students have an everyday iterator they can touch. I'll guarantee that causes other problems, though, and gets a ton of resistence. Even from me. :-)
You don’t have to call them “file iterators”, you just have to have to word “iterator” lying around to teach them when they ask why they can’t loop over a file twice. Which we do. In the same way, you don’t need to call lists “list iterables”, you just need to have the word “iterable” lying around to teach them when they ask what other kinds of things can go in a for loop. (As either you or Christopher said, it’s not a great word, but that’s another problem.) And you don’t need to call lists “list collections”, you just need to have the word “collection” lying around to teach them when they ask why ranges and lists and dicts let you loop over their values over and over. And that’s the word we don’t have. Which is why people keep trying to use the word “sequence” when it isn’t appropriate (calling a dict a sequence is very misleading—and range/xrange had the same problem before 3.2), or talk about “laziness” when it’s the wrong concept (ranges are lazy), etc. And it’s why I used the word “collection” even though it’s also incorrect, and had to follow up later in this paragraph to clarify, because not all of these things are sized containers (and maybe even vice-versa?), but that’s what “collection” means in Python. Because we have a concept and we don’t have a word for it.
Yes, and defining terminology for the one distinction that almost always is relevant helps distinguish that distinction from the other ones that rarely come up. Most people (especially novices) don’t often need to think about the distinction between iterables that are sized and also containers vs. those that are not both sized and containers, so the word for that doesn’t buy us much. But the distinction between iterators and things-like-list-and-so-on comes up earlier, and a lot more often, so a word for that would buy us a lot more.
We have that word and distinction. A file object *is* an iterator. A list is *not* an iterator. *for* works *with* iterators internally, and *on* iterables through the magic of __iter__.
“Not an iterator” is not a word. Of course you _can_ talk about things that don’t have names by being circuitous, but it’s harder. In theory, you could build a language out of any set of categories that carve up the world, and build all of the rest by composition. We don’t need the word “bird” when we could say “diapsids that aren’t reptiles”, or “liquid” when we could say “compressed matter that isn’t solid” or “fluid that isn’t gas or plasma”. Such a language would technically be able to discuss all the same things as English—but it would make communication much harder. And thinking clearly, too—human brains work better when the categories picked out by language are a rough match for the categories they need to think about than when they aren’t. And in practice, people do need to think about “things that can be looped over repeatedly and give you their values over and over”, and having to say “iterables that are not iterators” may be technically sufficient, but practically it makes communication and thought harder. It means we have to be more verbose and less to the point, and people make silly mistakes like the one in the parent thread, and people make more serious mistakes like teaching others that ranges are iterators, and then having to speak circuitously makes it harder to explain their mistakes to them.
But you *don't* use seek(0) on files (which are not iterators, and in fact don't actually exist inside of Python, only names for them do). You use them on opened *file objects* which are iterators. A file object is a file, in the same way that a list object is a list and an int object is an int.
No, it's not the same: your level of abstraction is so high that you've lost sight of the iterable/iterator distinction. All of the latter objects own their own data in a way that a file object does not. All of the latter objects are different from their iterators (where such iterators exist), while the file object is not.
That really is the wrong distinction, both at the novice level and at the Python-ideas level. You’re talking about laziness again. And while (nearly) all iterators are lazy, not all lazy things are iterators. In what sense does a range own its data? It doesn’t store it anywhere; it creates it in demand by doing arithmetic on the things it actually does store. If you’re really careful you can sort of explain that one, but then in what sense does a dict_keys or a memoryview or an mmap “own” its data that a file doesn’t? And yet, they all work like lists.
The fact that we use “file” ambiguously for a bunch of related but contradictory abstractions (a stream that you can read or write, a directory entry, the thing an inode points to, a document that an app is working on, …) makes it a bit more confusing, but unfortunately that ambiguity is forced on people before they even get to their first attempt at programming, so it’s probably too late for Python to help (or hurt).
Agreed. I would be much happier if we could discuss an example that is *not* iterating over files but *does* come up every day on StackOverflow. Maybe zips would work but I'm not sure the motivation comes together the way it does for files (why do zips want to be lazy? what are the compelling examples for zip of "restarting the iteration where you left off" with a new *for* statement?)
I think zips want to be lazy for exactly the same reason dict_items want to be lazy. People had real-life code that was wasting too much time or space building a list that was usually only going to be used for a single pass through a loop, so Python fixed that by making them lazy. But notice that one of them is an iterator and the other isn’t. So the distinction between the two isn’t about laziness. So why are zips lazy iterators instead of lazy views? I think it comes down to historical reasons and implementation simplicity. Designing a view for zip would be harder than for dict.items (see Swift for evidence) because its inputs are so much more general. A lot of tricky questions come up about both the API design and the implementation, that all have obvious answers for dict_items but not for zip. Meanwhile, zip was invented as itertools.izip, and itertools is… well, it’s right there in the name. And it was invented before Python has lots of other views to inspire it. So, it’s no surprise that it was an iterator. And even when 3.0 came along, it was a lot easier to say “let’s move izip, ifilter, and imap out of itertools and replace the old list-producing functions” than to design something entirely new, which, in the absence of really compelling need for something entirely new, should have won out, and did.
Lists, sets, ranges, dict_keys, etc. are not iterators. You can write `for x in xs:` over and over and get the values over and over. Because each time, you get a new iterator over their values.
You and I know that, because we know what an iterator is, and we know it's there because it has to be: *for* doesn't iterate anything but an iterator. But (except via a bytecode-level debugger) nobody has ever seen that iterator. You can use iter to get a similar iterator, of course, but it's not the same object that any for statement ever used. (Unless you explicitly created it with iter, but then you can re-run the for statement on it the way you do with a list.)
This is exactly why I wouldn’t explain it to a novice in terms of “for doesn’t iterate anything but an iterator”. Sure, you and I know that it does something nearly equivalent to calling iter() and then calls next() on the result until it receives a StopIteration, but that’s not why lists can be used in for loops; that’s just how Python does it. And in fact, if CPython had special-case opcodes for looping over old-style sequences or SequenceFast C sequences without ever creating the iterator, it wouldn’t change the visible behavior. In fact, under the covers, some C functions (like, IIRC, tuple.__new__) that accept any iterable do exactly that. It doesn’t change their observable behavior, so nobody needs to know. Of course when talking to you, or to Python-ideas, I can count on the fact that you know that iterators return self from iter(), and that “like a for loop” means “as if calling iter() and then calling next() repeatedly until an exception and swallowing the exception if it’s StopIteration”, but I don’t expect everyone who uses Python to know all of that.
Files, maps, zips, generators, etc. are not like that. They’re iterators. If you write `for x in xs:` twice, you get nothing the second time, because each time you’re using the same iterator, and you’ve already used it up. Because iter(xs) is xs when it’s a file or generator etc.
Genexps are iterators, but generators (in the sense of the product of a def that contains "yield") are not even iterable. Those are iterator factories
The word “generator” is ambiguous. The type with the name “generator” that’s publicly available as “types.GeneratorType” and testable with inspect.isgenerator and that has the attributes like gi_frame that the docs say all generators have, those are generator iterators. And the things testable with .__code__.co_flags & CO_GENERATOR, those are generator functions. They’re both called “generator” so often that you have to be careful to say “generator iterator” or “generator function” when it’s not clear from context which one you mean, but I think it’s pretty clear from the context “generators are iterators” and “if you write for x in xs:” and so on which one I meant.
The only representation of files in Python is file objects—the thing you get back from open (or socket.makefile or io.StringIO or whatever else)—and those are iterators.
The thought occurred to me, "What if that was a bad decision? Maybe in principle files shouldn't be iterators, but rather iterables with a real __iter__ that creates the iterable." I realized that I'd already answered my own question in part: I find it easy to imagine cases where I'd want to get some lines of input from a file as a higher-level unit, then stop and do some processing. The killer app for me is mbox files. Another plausible case is reading top-level Lisp expressions from a file (although that doesn't necessarily divide neatly into lines.) I also found it surprisingly complicated to think about the consequences to the type of making that change.
I think there’s an easier way to see why this was a good decision: because files have positions. (Or, if you prefer, because files are streams, which implies that they have positions.) We don’t have a read_at(pos, size) method, we have a read(size) method that reads from where you left of. Seeking does exist, but it’s secondary—and it works by changing where the file thinks it left off. Once you think of files as things that know where they are, it makes more sense to wrap an iterator, rather than a reusable iterable, around them. You could argue that having a position was a bad idea in the first place, that Python shouldn’t have done it just because C stdio does it (and Unix kernels make it easy). Sure, that would mean we couldn’t use sockets and pipes as files and it would be weird to deal with special Unix files like /dev/random, but none of those things are exactly fundamental to novices. And we could even have two abstractions—a “stream” is what we call a file today, a “file” is a higher-level thing that you can randomly access, iterate repeatably, or ask for a stream from, and novices would only have to learn files rather than streams (until they have to do something like dealing with an mbox). But nearly every other language and platform in use today does the same thing as Python (and C and UNIX). If you know FILE*, NSFileHandle, or whatever the thing is called in bash, PHP, Ruby, C#, Go, Lisp, etc., a Python file is the exact same thing. And vice versa. And if you need to deal with native Win32 file handles via pywin32, they work pretty much the same way as the files you already know; you just have to know how to change the spelling of some of the functions. And so on. That’s worth a lot. (Plus, two abstractions is always more to learn than one.)
Going back to the documentation theme, maybe one way to approach explaining iterators is to start with the use case of files as (non-seekable) streams, show how 'for iteration' can be "restarted" where you left off in the file, and teach that "this is the canonical behavior of iterators; lists etc are *iterable* because 'for' automatically converts them to iterators "behind the scenes".
I still think this is getting it backward. Iterating lists is more fundamental that iterating files. Possibly even iterating ranges is. And you don’t have to understand that it works by converting them to iterators to understand it. And even if you do understand that, it doesn’t really solve the problem, because “convert to an iterator behind the scenes” doesn’t really tell you that you can do that repeatedly and get independent results. Most other cases where Python converts something behind the scenes, like adding 2 to a float or a Fraction, this doesn’t matter. Nobody cares whether each time you add 2 you get the same 2.0 or a different one, or whether each time you write the same string to a text file you get the same UTF-8 bytes or a new one. Iterators probably aren’t the _only_ exception to that, but I’m pretty sure they’re the first one many people run into. On the other hand, this would certainly get the notion of “files are streams” across to novices (as opposed to people coming from other languages) faster and more easily than we do today, which might help a lot of them. It might even turn out to solve the “why can’t I loop over this file twice” question for a lot of people in a different way, and that different way might be something you could build on to explain the difference between zip and range. “Like a stream” is much more accurate than “because it wants to be lazy”, and maybe easier to understand as well.

Andrew Barnert writes:
The answer is that files are iterators, while lists are… well, there is no word.
As Chris B said, sure there are words: File objects are *already* iterators, while lists are *not*. My question is, "why isn't that instructive?"
Well, it’s not _completely_ not instructive, it’s just not _sufficiently_ instructive.
Language is more useful when the concepts it names carve up the world in the same way you usually think about it.
True. But that doesn't mean we need names for everything. In your "phases of matter" example, there are two characteristics, fluidity (which gases and liquids have, but solids don't) and compressibility (which gases have, but neither solids nor liquids do). Here the tripartite vocabulary makes sense, since they're orthogonal, and (in our modern world) all three concepts are everyday experience.
Yes, it’s true that we can talk about “iterables that are not iterators”. But that doesn’t mean there’s no need for a word.
True, but that also doesn't mean there *is* need for a word.
We don’t technically need the word “liquid” because we could always talk about “compressibles that are not solid” (or “fluids that are not gas”)
True, but neither "compressibles" nor "fluids" "is a thing". Instead, in everyday language "fluid" is pretty much synonymous with "liquid", and AFAIK there are no compressibles that aren't fluids, so "compressible" is pretty much purely an adjective. OTOH, it's useful to pick out each phase of matter separately. You haven't make an argument that it's useful to pick out "iterables that aren't iterators" separately yet, except that you believe that a word would help (which to me is evidence for the need, but not very strong evidence). The reason I'm quite unpersuaded is that there's also a concept of marked vs unmarked in linguistics. Marked concepts are explicitly indicated; unmarked concepts require an explicit contrast with the marked concept, or they get folded into the generic word, leaving some ambiguity that gets resolved by context. (This can get really persnickety with no obvious rules even in the same domain. For example, with gender, "he" is unmarked, and you need to disambiguate "male person" from "person of unknown gender" fromm context, at least in traditional English grammar. While "she" is marked. By contrast, "male" and "female" are both unambiguous.) Now, it seems to me that we are only ever going to discuss iterators in the context of iteration, which means our domain of discourse is pretty much restricted to iterables. (In the sense that there's nothing left to discuss about iteration once you've classed an entity as "not iterable".) Given the way iterable and iterator are defined, it seems perfectly reasonable to me that iterator would be marked, non-iterator iterable left to its own devices, and the word "iterable" disambiguated from context, or perhaps marked with some fairly clumsy modifier. So how can one explain "the problem with re-iterating files"? Here's how I would (now that I've thought more about it than I should ;-): Student: OK, so we use 'for' to iterate over lists etc. And it's cool that we can do "for line in file". But how come if I need to do it twice, with lists I can just use a new 'for' statement, but with files nothing useful happens? Teacher: That's a good question. You know that "things we can use in a for statement" are called "iterables" right? Well, files are a special kind of iterable called "iterator", and you can "start them where you left off" with a new 'for' statement. Student: But the 'for' statement runs out! You don't want to restart in the middle! Teacher: Exactly! And that's why nothing useful happens when you use a second for statement on an already-open file. But you can use 'break' to stop partway through. Student: Huh? What's that good for? Teacher: [Gives relevant example: paragraph-wise processing in text files with empty line paragraph breaks, message-wise processing in mbox files, etc.] Student: Well, OK. But that's not what I expected or wanted. Teacher: [Presses "play" on Rolling Stones tune cued up for this moment. Continues as voice-over.] True enough. I wasn't there when they designed this interface to files, so I'm not sure all the reasons but I do find it useful for the kind of processing I described earlier. Of course, you can get the effect you want by using 'open' again. It's a little annoying that *you* have to remember to do this. Also, there is a way to reset files the way you want. Just use the '.seek(0)' method on the file before the second 'for' statement. Student: Hey, wait! Suppose I wanted to "restart where I left off" in iterating over a list. I guess that just doesn't work? Teacher: [Wishes she had more students like this.] Another good question. If you want to do that, you have to construct an iterator from the list: 'lit = iter(l)'. Now iterate over 'lit', and you can break in the middle and restart with a new 'for' statement, just like with files. It's a little annoying that you have to remember ... Student: [clobbers teacher with a handy copy of Python Essential Reference] The point of the little dialogue is that although the word "iterator" is used, the student only has to remember it until the end of any sentence in which it's used. I think the student's responses are quite natural, and they don't mention "iterator". I suspect this student won't remember 'iter' but I bet she does remember '.seek(0)'. On the other hand, what is there to explain *specifically* about iterables that aren't iterators that explaining about iterables doesn't do just as well? I guess there's the inverse of the "why doesn't it work with files?" question, but does that ever get asked? Surely almost all students encounter iteration over sequences first, and only later over iterators?
2. The *for* statement and the *next* builtin require an iterator object to work. Since for *always* needs an iterator object, it automatically converts the "in" object to an iterator implicitly. (Technical note: for the convenience of implementors of 'for', when iter is applied to an iterator, it always returns the iterator itself.)
I think this is more complicated than people need to know, or usually learn. People use for loops almost from the start, but many people get by with never calling next. All you need is the concept “thing that can be used in a for loop”, which we call “iterable”.
Conceded. "Had I only more time, I would have written a much shorter post."
“Iterable” is the fundamental concept.
We agree on this too.
Of course you will need to learn the concept “iterator” pretty soon anyway, but only because Python actually gives you iterators all over the place. [...] You want to know whether they can be used in for loops
I think now you are over-thinking this. Iterators *are* iterables. You have one because somebody told you it's iterable, and you want to use it in a 'for' loop. You only need to know that it's an iterator if you want to re-iterate from the beginning, rather than re-start from where you left off. "Iterator" is the marked case. But the "marker" is that you find out about it when it doesn't "do what I meant".
I think many people do get this, and that’s exactly what leads to confusion. They think that “lazy” and “iterator” (or “consumed when you loop over it”) go together. But they don’t.
I'll grant that my words admit such confusion, especially if people are predisposed to it. I think they are. After all, none of your "many people" have read my thoughts on the matter before this thread! Just as there are times when LBYL is the appropriate programming technique (even though EAFP is possible), sometimes people who don't read the whole relevant manual section in advance are going to get burned by their guesses and analogies (especially if they got them from others of the same type).
Back to the discussion: the child can touch both, and does so frequently (assuming you don't feed them from the dog's bowl and also bathe them regularly). They've seen glasses break, most likely, and splashed water.
And someone learning Python does get to touch both things here. They get lists, dicts, and ranges, and they get files, zips, and enumerate. Both categories come up pretty early in learning Python, just like both solids and liquids come up pretty early in learning to be human.
No, they don't, in a sense I explained. Until the student has a use case where they need to restart (either where they left off or from the beginning) they can't tell the difference because they just put the whatever in a 'for' statement which works like magic -- and to them it is pure magic, because they don't know what iterable or iterator or __iter__ or iter or __next__ or next are. They just know you can use lists and some other things in a 'for' statement. The restart distinction may not come up for a long time. I didn't really have a use case for it, until one time I wanted to do something with mbox files and I didn't like what the mailbox module does. So I had to roll my own.
No, it’s iterables whose purpose is being fed to a for statement.
I disgree, both in the abstract (Sequences are iterable, but don't necessarily have an __iter__, and so I don't see how you can support your assertion that their purpose is to be fed to 'for') and in the concrete (lots of iterables with __iter__ are instantiated and never intended to be iterated, yet are useful). By contrast, every iterator has an __iter__, and the technical term for an iterator that is never iterated is "garbage".
Yes, iterators are what for statements use under the covers to deal with iterables, but you don’t need to learn that until well after you’ve learned that iterators are what you get from open and zip.
True enough, my bad. I was confounding two documentation problems there. One is teaching new users, and the other is helping experts get it exactly right. I've mixed them up quite a bit, but my list of 5 points should be thought of as aimed at a concise but comprehensive description rather than a tutorial.
You don’t have to call them “file iterators”, you just have to have to word “iterator” lying around to teach them when they ask why they can’t loop over a file twice. Which we do.
Eh, that's my argument. :-)
In the same way, you don’t need to call lists “list iterables”[.]
And there's no way that I would. "Iterable" is an adjective. The usage "iterables" for the class of iterable objects is something of an abuse.[2] My point about files is that they're the thing I would expect would be most folks' first unpleasant encounter with an exhausted iterator object, and by naming them as "file iterators" you might be able to induce a lot of "a ha!" moments. You come around to a related suggestion below. I admit that the "file iterator" suggestion is pretty implausible.
You just need to have the word “iterable” lying around to teach them when they ask what other kinds of things can go in a for loop.
I don't think you meant to write that: when they ask that, you don't say "iterables, of course", you say "tuples, sets, and perhaps surprisingly dicts, as well as dict views, and many other things." It's only when you or the student need a name for that whole class that you bring up the term "iterable" (at least in its noun form). But I don't think that comes up, at least on the student side, for quite a while. A good student might ask "what else is iterable?" but "What else can I use in a 'for' statement?" is perfectly serviceable. I suppose the teacher might find it painful to completely avoid the term "iterable" (especially as an adjective, and "iterator", for that matter), but I would solve that problem as in the dialog: just use them in such a way that the student doesn't need to remember them. I think that's quite do-able, even natural. I do not claim this leaves the student with a complete and satisfactory understanding of the concept of iterator, merely that it allows them to understand the difference between iterables that start from where they left off and those that begin again at the beginning.
And you don’t need to call lists “list collections”, you just need to have the word “collection” lying around to teach them when they ask why ranges and lists and dicts let you loop over their values over and over.
Have you ever been asked that, outside of the context of explaining why files, zips, etc. don't allow re-iteration from the start? Has anyone come to you puzzled because the second loop over a list did useful work?
We have that word and distinction. A file object *is* an iterator. A list is *not* an iterator. *for* works *with* iterators internally, and *on* iterables through the magic of __iter__.
“Not an iterator” is not a word. Of course you _can_ talk about things that don’t have names by being circuitous, but it’s harder.
Or you can not talk about them at all. This is very frustrating, because I agree with everything you say as a general principle, but your concrete discussion never refers to iterators or iterables. It's always an analogy to birds and reptiles and plasmas and liquids. I think that analogy breaks down because I doubt new programmers get confused by the fact that they can re-iterate over lists. Like, not ever. I'd even bet that students who try breaking out, then restarting where they left off, and have it fail by restarting from the beginning, are disappointed but not shocked. So when do you *need* to talk about non-iterator iterables? Outside of threads like this one?
And in practice, people do need to think about “things that can be looped over repeatedly and give you their values over and over”, and having to say “iterables that are not iterators” may be technically sufficient, but practically it makes communication and thought harder.
Or you can just treat "things that can be looped over repeatedly and give you their values over and over" as the unmarked case of "iterable", and speak of "iterators" when you need to distinguish the marked case.[3] Use of "marking" is something we do all the time. I can't say for sure that it would work here, but nothing you've written yet convinces me it wouldn't.
It means we have to be more verbose and less to the point,
It doesn't mean we *have* to be more verbose, in principle. "Marking" works fine in natural language, just as anaphoric "it" does. I may be missing something, but you need to be more concrete about what the need for this word (yet to be named) is.
and people make silly mistakes like the one in the parent thread, and people make more serious mistakes like teaching others that ranges are iterators,
Indeed they do. I don't think that has as much to do with people not having a word for iterables that aren't iterators as it does with them not understanding what an iterator is. Just because you have a word, say "nandaro", for iterables that aren't iterators doesn't mean that otherwise well-informed people will correctly classify ranges as nandaro rather than incorrectly as iterators. As far as I can tell, most of the rest of your post addresses an argument that I'm not making, and I don't know how to do it better, so I'm just going to let it rest there. As mentioned above, this captures a good bit of what I'm trying to get at:
On the other hand, this would certainly get the notion of “files are streams” across to novices (as opposed to people coming from other languages) faster and more easily than we do today, which might help a lot of them. It might even turn out to solve the “why can’t I loop over this file twice” question for a lot of people in a different way, and that different way might be something you could build on to explain the difference between zip and range. “Like a stream” is much more accurate than “because it wants to be lazy”, and maybe easier to understand as well.
Footnotes: [1] Or maybe "marked" doesn't apply here because those words are on equal footing -- I'm not a linguist, I've just heard the concept discussed by real linguists. [2] Linguists have a technical term for this kind of "abuse" but I don't remember it. [3] I recognize that you can create objects that break this dichotomy. I doubt they're important enough to impede discussion for lack of the word for "non-iterator iterables". Again, concrete examples would really help.

On Wed, May 13, 2020 at 10:51:58AM -0700, Andrew Barnert via Python-ideas wrote:
Students often want to know why this doesn’t work:
[snip example showing the difference between iterators which become exhausted after iteration, and sequences that don't]
The answer is that files are iterators, while lists are… well, there is no word.
Sequences. Containers. Non-iterator-iterables. There's three words :-) Albeit one of them is hyphenated, but if we were German we might call it a noniteratoriterablesequenceobject and abbreviate it to NIISO :-) Oh, there's also subscriptable, to describe things that can be subscripted. Or there's "list-like". The glossary defines all of Iterator, Iterable, Sequence but not Container: https://docs.python.org/3/glossary.html There's no short, common word specifically for iterables that aren't iterators for the same reason that there's no short, common word specifically for dogs that aren't poodles, plants that aren't roses, or programming languages that aren't Python :-)
A file object is a file, in the same way that a list object is a list and an int object is an int.
A file object is a proxy to a file. It doesn't itself live in a file system, but it points to an entity which does. In the same way that it is both useful and necessary to distinguish between an iterable that obeys the iterator protocol and one that does not, it's useful and necessary to distinguish between an in-memory file object and the file it points to. File objects have a close method; files don't. And as you point out, file objects are iterators, but files are just a bunch of bytes on a hard drive. -- Steven

On Mon, May 11, 2020 at 02:37:15PM +0900, Stephen J. Turnbull wrote:
I think part of the problem is that people rarely see explicit iterator objects in the wild. Most of the time we encounter iterator objects only implicitly. Nomenclature *is* a problem (I still don't know what a "generator" is: a function that contains "yield" in its def, or the result of invoking such a function),
Strictly speaking, a function containing yield is a "generator function", the object it returns is a "generator iterator", or just generator. Analogy: "a float" is the object returned by the `float()` function, not the function itself. People are often sloppy in their terminology, but the inspect module makes it clear: https://docs.python.org/3/library/inspect.html#inspect.isgeneratorfunction Alas, the glossary is itself a bit sloppy: you have to read it with care to see it talks about *generator functions* and *generator iterators*. The entries for async generators are better. https://docs.python.org/3/glossary.html -- Steven

+1 I like this! I never considered this idea. It's a good combination of efficiency and elegance. On Sat, May 9, 2020 at 10:41 PM Christopher Barker <pythonchb@gmail.com> wrote:
Funny you should bring this up.
I've been meaning, for literally years, to propose not quite this, but adding a "slice iterator" to the sequence protocol.
(though note that one alternative is adding slice syntax to itertools.islice)
I even got so far as to write a draft PEP and prototype.
NOTE: I'm not saying this is ready for a PEP, but it was helpful to use the format to collect my thoughts.
https://github.com/PythonCHB/islice-pep/blob/master/pep-xxx-islice.rst
And the prototype implementation:
https://github.com/PythonCHB/islice-pep/blob/master/islice.py
I never got around to posting here, as I wasn't quite finished, and was waiting 'till I had time to deal with the discussion.
But since it was brought up -- here we go!
If folks have an interest in this, I'd love to get feedback.
-CHB
On Sat, May 9, 2020 at 3:51 AM Chris Angelico <rosuav@gmail.com> wrote:
On Sat, May 9, 2020 at 8:00 PM Alex Hall <alex.mojaki@gmail.com> wrote:
On Sat, May 9, 2020 at 11:15 AM Ram Rachum <ram@rachum.com> wrote:
Here's an idea I've had. How about instead of this:
itertools.islice(iterable, 7, 20)
We'll just have:
itertools.islice(iterable)[7:20]
Advantages: 1. More familiar slicing syntax. 2. No need to awkwardly use None when you're interested in just
specifying the end of the slice without specifying the start, i.e. islic(x)[:10] instead of islice(x, None, 10)
3. Doesn't require breaking backwards compatibility.
What do you think?
Looking at this, my train of thought was:
While we're at it, why not allow slicing generators?
Bear in mind that islice takes any iterable, not just a generator. I don't think there's a lot of benefit in adding a bunch of methods to generator objects - aside from iteration, the only functionality they have is coroutine-based. There's no point implementing half of itertools on generators, while still needing to keep itertools itself for all other iterables.
And if we do that, what about regular indexing? But then, what if I do `gen[3]` followed by `gen[1]`? Is it an error? Does the generator have to store its past values? Or is `gen[1]` the second item after `gen[3]`? Or wherever the generator last stopped?
It makes no sense to subscript a generator like that.
Well that's probably why I can't index or slice generators - so that code doesn't accidentally make a mess trying to treat a transient iterator the way it does a concrete sequence. A generator says "you can only iterate over me, don't try anything else".
Which leads us back to your proposal. `islice(iterable)[7:20]` looks nice, but it also allows `foo(islice(iterable))` where `foo` can do its own indexing and that's leading to dangerous territory.
If foo can do its own indexing, it needs to either specify that it takes a Sequence, not just an Iterable, or alternatively it needs to coalesce its argument into a list immediately. If it's documented as taking any iterable, it has to just iterate over it, without subscripting.
ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/WADS4D... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/U35BS5... Code of Conduct: http://python.org/psf/codeofconduct/

On Sat, May 9, 2020 at 9:41 PM Christopher Barker <pythonchb@gmail.com> wrote:
Funny you should bring this up.
I've been meaning, for literally years, to propose not quite this, but adding a "slice iterator" to the sequence protocol.
(though note that one alternative is adding slice syntax to itertools.islice)
I even got so far as to write a draft PEP and prototype.
NOTE: I'm not saying this is ready for a PEP, but it was helpful to use the format to collect my thoughts.
https://github.com/PythonCHB/islice-pep/blob/master/pep-xxx-islice.rst
And the prototype implementation:
https://github.com/PythonCHB/islice-pep/blob/master/islice.py
I think this is a good idea. For sequences I'm not sure how big the benefit is - I get that it's more efficient, but I rarely care that much, because most lists are small. Why not extend the proposal to all iterators, or at least common ones like generators? That would allow avoiding itertools when I have no other choice. You write "This PEP proposes that the sequence protocol be extended". What does that mean exactly? I assume you don't want to magically add an `islice` property to every class that has `__len__` and `__getitem__`. Will you just add it to `collections.abc.Sequence`, the builtins, and the stdlib? Perhaps this could come with some new syntax? My first thought was `iterator(1:2)`, the idea being that changing the brackets would give it lazy iterator feel the same way that changing the brackets on a list comprehension turns it into a generator. But it probably looks too much like a function call. So maybe we can play with double brackets instead: ``` import itertools for (l1, r1), (l2, r2) in itertools.product('() {} []'.split(), repeat=2): print(f'sequence{l1}{l2}1:2{r2}{r1}') sequence((1:2)) sequence({1:2}) sequence([1:2]) sequence{(1:2)} sequence{{1:2}} sequence{[1:2]} sequence[(1:2)] sequence[{1:2}] sequence[[1:2]] ```

On Sat, May 9, 2020 at 1:58 PM Alex Hall <alex.mojaki@gmail.com> wrote:
I think this is a good idea. For sequences I'm not sure how big the benefit is - I get that it's more efficient, but I rarely care that much, because most lists are small. Why not extend the proposal to all iterators, or at least common ones like generators?
Because the slice syntax, so far at least, only applies to Sequences. And in general, you can't use the full slice syntax on iterators anyway (they don't have a length). and well, iterators are already iterators ... so there isn't an "extend the proposal" here at all. But without thinking about i much, I'm not sure adding slice syntax to iterators in general makes sense -- slicing is quite connected to indexing, which iterators don't support.
That would allow avoiding itertools when I have no other choice.
reading the thread on adding "strict" to zipk I'd say "avoiding itertools" is not really a goal of most folks :-) You write "This PEP proposes that the sequence protocol be extended". What
does that mean exactly? I assume you don't want to magically add an `islice` property to every class that has `__len__` and `__getitem__`. Will you just add it to `collections.abc.Sequence`, the builtins, and the stdlib?
Details to be worked out. As Python as evolved over the years from protocols, to ABC, and it's not fully clear yet, even then, probably, yes "add it to `collections.abc.Sequence`, the builtins, and the stdlib? I"m not sure every class that has __len__ and __getitem__ could Magically grow an new method, and I sure wouldn't want them to. In fact, in theory, you'd want every class that supports slicing to grow this functionality, but I don't know there is any way to know whether a class, in general, supports slicing. Perhaps this could come with some new syntax? My first thought was
`iterator(1:2)`, the idea being that changing the brackets would give it lazy iterator feel the same way that changing the brackets on a list comprehension turns it into a generator. But it probably looks too much like a function call.
doesn't just look like one -- it would clash. Remember that "iterable" and "ioterator" is a protocol -- anything can support it. That would make it impossible to have a callable be an iterator
So maybe we can play with double brackets instead:
What would this new syntax do, regardless of what it was? I"m not sure I follow. My idea is about creating a view iterable on sequences, I'm not sure what view iterable on an iterable would be? -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Sun, May 10, 2020 at 5:00 AM Christopher Barker <pythonchb@gmail.com> wrote:
On Sat, May 9, 2020 at 1:58 PM Alex Hall <alex.mojaki@gmail.com> wrote:
I think this is a good idea. For sequences I'm not sure how big the benefit is - I get that it's more efficient, but I rarely care that much, because most lists are small. Why not extend the proposal to all iterators, or at least common ones like generators?
Because the slice syntax, so far at least, only applies to Sequences. And in general, you can't use the full slice syntax on iterators anyway (they don't have a length).
Is that a major problem? Just use the syntax the way you would use itertools.islice. Incidentally, I imagine we could allow itertools.islice(iterable, 0, -n) by iterating ahead by n items, keeping them in memory, and stopping when the peeking ahead stops. I can understand though that this comes with some caveats.
and well, iterators are already iterators ... so there isn't an "extend the proposal" here at all.
I don't know what you're saying here.
But without thinking about i much, I'm not sure adding slice syntax to iterators in general makes sense -- slicing is quite connected to indexing, which iterators don't support.
Again, it would make exactly as much sense as itertools.islice.
That would allow avoiding itertools when I have no other choice.
reading the thread on adding "strict" to zipk I'd say "avoiding itertools" is not really a goal of most folks :-)
Most of the points you made in your PEP still apply. itertools.islice is verbose and not particularly readable. I'm not suggesting doing this just to make things a little more convenient.
Perhaps this could come with some new syntax? My first thought was
`iterator(1:2)`, the idea being that changing the brackets would give it lazy iterator feel the same way that changing the brackets on a list comprehension turns it into a generator. But it probably looks too much like a function call.
doesn't just look like one -- it would clash. Remember that "iterable" and "iterator" is a protocol -- anything can support it. That would make it impossible to have a callable be an iterator
`iterator(1:2)` isn't a function call, it isn't valid syntax. The colon would distinguish an islice from a call. But again, while it's techincally unambiguous, it can still be confusing, and you've proven that point.
So maybe we can play with double brackets instead:
What would this new syntax do, regardless of what it was? I"m not sure I follow. My idea is about creating a view iterable on sequences, I'm not sure what view iterable on an iterable would be?
`iterator(1:2)` would compile to roughly `itertools.islice(iterator, 1, 2)`, which would handle the rest at runtime: - Failing if `iterator` isn't actually an iterator. - Failing if a negative index is used on something without a length. - Handling negative indices for sequences (is there any reason we don't have that now?) - Possibly deferring to a dunder like `__islice__` if one is defined so that some classes (e.g. lists) can return a clever view or something if they want.

On May 10, 2020, at 02:42, Alex Hall <alex.mojaki@gmail.com> wrote:
- Handling negative indices for sequences (is there any reason we don't have that now?)
Presumably partly just to keep it minimal and simple. Itertools is all about transforming iterables into other iterables in as generic a way as possible. None of the other functions do anything special if given a more fully-featured iterable. But also, negative indexing isn’t actually part of the Sequence protocol. (You don’t get negative indexes for free by inheriting Sequence as a mixin, nor is it ensured by testing isinstance with Sequence as an ABC.) It’s part of the extra stuff that list and the other builtin sequences happen to do. You didn’t suggest allowing negative islicing on set even though it could just as easily be implemented there, because you don’t expect negative indexing as part of the Set protocol (or the Sized Iterable protocol); you did expect it as part of the Sequence protocol, but Python’s model disagrees. Maybe practicality beats purity here, and islice should take negative indices on any Sequence, or even Sized, input, even though that makes it different from other itertools functions, and ignores the fact that it could be simulating negative indexing on some types where it’s meaningless. But how often have you wanted to call islice with a negative index? How horrible is the workaround you had to write instead? I suspect that it’s already rare enough of a problem that it’s not worth it, and that any form of this proposal would make it even rarer, but I could be wrong.

On Sun, May 10, 2020 at 8:20 PM Andrew Barnert <abarnert@yahoo.com> wrote:
On May 10, 2020, at 02:42, Alex Hall <alex.mojaki@gmail.com> wrote:
- Handling negative indices for sequences (is there any reason we don't
have that now?)
Presumably partly just to keep it minimal and simple. Itertools is all about transforming iterables into other iterables in as generic a way as possible. None of the other functions do anything special if given a more fully-featured iterable.
But also, negative indexing isn’t actually part of the Sequence protocol. (You don’t get negative indexes for free by inheriting Sequence as a mixin, nor is it ensured by testing isinstance with Sequence as an ABC.) It’s part of the extra stuff that list and the other builtin sequences happen to do. You didn’t suggest allowing negative islicing on set even though it could just as easily be implemented there, because you don’t expect negative indexing as part of the Set protocol (or the Sized Iterable protocol); you did expect it as part of the Sequence protocol, but Python’s model disagrees.
I understand that, but the same could be said about all forms of slicing. It's not part of the sequence protocol, it's not provided by the ABC, it's just a nice thing that lists do. Maybe practicality beats purity here, and islice should take negative
indices on any Sequence, or even Sized, input, even though that makes it different from other itertools functions, and ignores the fact that it could be simulating negative indexing on some types where it’s meaningless. But how often have you wanted to call islice with a negative index? How horrible is the workaround you had to write instead? I suspect that it’s already rare enough of a problem that it’s not worth it, and that any form of this proposal would make it even rarer, but I could be wrong.
You're right, I don't really care about islice accepting negative indices in isolation. But it's different in the context of my form of this proposal, where a certain syntax delegates to islice (or something very close to it) and we want that syntax to support negative indexing.

On Sat, May 9, 2020 at 1:58 PM Alex Hall <alex.mojaki@gmail.com> wrote:
https://github.com/PythonCHB/islice-pep/blob/master/pep-xxx-islice.rst
And the prototype implementation:
https://github.com/PythonCHB/islice-pep/blob/master/islice.py
I think this is a good idea. For sequences I'm not sure how big the benefit is - I get that it's more efficient, but I rarely care that much, because most lists are small. Why not extend the proposal to all iterators, or at least common ones like generators? That would allow avoiding itertools when I have no other choice.
I'm still confused what you mean by extend to all iterators? you mean that you could use slice syntax with anything iterable> And where does this fit in to the iterable vs iterator continuum? iterables will return an iterator when iter() is called on them. So are you suggesting that another way to get an iterator from an iterable would be to pass a slice somehow that would return an iterator off that slice? so: for i in an_iterable(a:b:c): ... would work for any iterable? and use an iterator that would iterate as specified by the slice? That is kind of cool. Though it is heading in a different direction that where Andrew was proposing, that this would be about making and using views on sequences, which really wouldn't make sense for any iterator. And, of course, adding syntax is a heavier lift (though extending a major built in protocol is not a small lift by any means) -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Mon, May 11, 2020 at 12:50 AM Christopher Barker <pythonchb@gmail.com> wrote:
I'm still confused what you mean by extend to all iterators? you mean that you could use slice syntax with anything iterable>
And where does this fit in to the iterable vs iterator continuum?
iterables will return an iterator when iter() is called on them. So are you suggesting that another way to get an iterator from an iterable would be to pass a slice somehow that would return an iterator off that slice?
so:
for i in an_iterable(a:b:c): ...
would work for any iterable? and use an iterator that would iterate as specified by the slice?
Translate `an_iterable(a:b:c)` to `itertools.islice(an_iterable, a, b, c)`. From there your questions can be answered by playing with itertools.islice. It accepts any iterable or iterator and returns an iterator: ``` import itertools s = itertools.islice([1, 2, 3], 2) print(s) assert s is iter(s) s2 = itertools.islice(s, 1) print(s2) ```
Though it is heading in a different direction that where Andrew was proposing, that this would be about making and using views on sequences, which really wouldn't make sense for any iterator.
The idea is that islice would be the default behaviour and classes could override that to return views if they want.

On May 11, 2020, at 10:57, Alex Hall <alex.mojaki@gmail.com> wrote:
On Mon, May 11, 2020 at 12:50 AM Christopher Barker <pythonchb@gmail.com> wrote:
Though it is heading in a different direction that where Andrew was proposing, that this would be about making and using views on sequences, which really wouldn't make sense for any iterator.
The idea is that islice would be the default behaviour and classes could override that to return views if they want.
It is possible to get both, but I don’t think it’s easy. I think the ultimate unification of these ideas is the “views everywhere” design of Swift. Whether you have a sequence or just a collection or just a one-shot forward-only iterable, you use the same syntax and the same functions to do everything—copy-slicing, view-slicing, chaining, mapping, zipping, etc. And the result is always a view with as much functionality as makes sense (do filtering a sequence gives you a view that’s a reversible collection, not a sequence). So you can view-slice the result of a genexpr the same way you would a list, and you just get a forward-only iterable view instead of a full-fledged sequence view. I’ve started designing such a thing multiple times, every couple years or so, and always realize it’s even more work than I thought and harder to fit into Python than i thought and give up. But maybe doing it _just_ for view slicing, rather than for everything, and requiring a wrapper object to use it, is a lot simpler, and useful enough on its own. And that would fit well into the Python way of growing by adding stuff as needed, and only trying to come up with a complete and perfect general design up front when absolutely necessary.

On Mon, May 11, 2020 at 11:38 AM Andrew Barnert <abarnert@yahoo.com> wrote:
On May 11, 2020, at 10:57, Alex Hall <alex.mojaki@gmail.com> wrote:
On Mon, May 11, 2020 at 12:50 AM Christopher Barker <pythonchb@gmail.com> wrote:
Though it is heading in a different direction that where Andrew was proposing, that this would be about making and using views on sequences, which really wouldn't make sense for any iterator.
The idea is that islice would be the default behaviour and classes could override that to return views if they want.
I'm still confused about this -- islice returns an iterator that iterates over the passed-in iterable -- that is standard behvior for most tools in itertools.
So I ca see that it would be nice to have a slice syntax that would work on all iterables, not just sequences, but I *think* that's a totally different idea. Anyway, thanks all for the input. When get a chance, I'll update my proposal with the input. I think I'll go for Andrew's idea of a sequence_view object -- that would give me my "lazy slice", and other nifty features.
But maybe doing it _just_ for view slicing, rather than for everything, and requiring a wrapper object to use it, is a lot simpler, and useful enough on its own.
I'm not quite sure what a "view for slicing" means, but maybe it's what I'm thinking about. But I would describe what I'm thinking about is a view object that you can get with slicing syntax. There are two key parts here -- we *could* have just an iterator with slice syntax, and also a view without slice syntax, but Im all for getting them together. Again, I welcome PRs on my notes and prototpe code: https://github.com/PythonCHB/islice-pep I'd particularly welcome text about the motivation and use-cases for a sequence view object -- my text is all about only the iterating part. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Sat, 9 May 2020 at 10:13, Ram Rachum <ram@rachum.com> wrote:
Here's an idea I've had. How about instead of this:
itertools.islice(iterable, 7, 20)
We'll just have:
itertools.islice(iterable)[7:20]
Advantages: 1. More familiar slicing syntax. 2. No need to awkwardly use None when you're interested in just specifying the end of the slice without specifying the start, i.e. islic(x)[:10] instead of islice(x, None, 10) 3. Doesn't require breaking backwards compatibility.
What do you think?
Why would you use islice(x, None, 10)? islice(x, 10) does the same... I guess you might occasionally do islice(x, None, end, step), but that seems fairly rare. The proposed syntax looks cute, but I'm not sure it's an improvement. And you have to consider what happens when people start passing islice(x) objects around *without* immediately indexing them. You now have objects that (presumably) support slice indexing, but not indexing with an integer, and not len(). Those are bound to end up somewhere they shouldn't and break someone's assumptions... These objections aren't showstoppers, but IMO they do far outweigh the relatively trivial benefits. Paul

On Sat, May 9, 2020 at 12:58 PM Paul Moore <p.f.moore@gmail.com> wrote:
Why would you use islice(x, None, 10)? islice(x, 10) does the same... I guess you might occasionally do islice(x, None, end, step), but that seems fairly rare.
My mistake, I meant islice(x, 10, None) where you're doing the slice [10:]. I didn't think about the step case, that's another small argument in favor of this feature.

On May 9, 2020, at 02:12, Ram Rachum <ram@rachum.com> wrote:
Here's an idea I've had. How about instead of this:
itertools.islice(iterable, 7, 20)
We'll just have:
itertools.islice(iterable)[7:20]
I’ve actually built this.[1] From my experience, it feels clever at first, but it can get confusing. The problem is that if you slice twice, or slice after nexting, you can’t get a feel for what the remaining values should be unless you work it through. Of course the exactly same thing is true with using islice twice today, but you don’t _expect_ that to be comprehensible in terms of slicing the original iterable twice, while with slice notation, you do. Or at least I do; maybe that’s just me. And meanwhile, even though the simple uses aren’t confusing, I’ve never had any code where it made things nicer enough that it seemed worth reaching into the toolbox. But again, maybe that’s just me. If you want to play with this and can’t implement it yourself easily, I could dig up my implementation. But it’s pretty easy (especially if you don’t try to optimize and just have __getitem__ return a new islice around self). —- [1] Actually, I built an incomplete viewtools (a replacement for itertools plus zip, map, etc. that gives you views that are reusable iterables and forward as much input behavior as possible—so map(lambda i: i*2, range(10)) is a sequence, while filter(lambda i: i%2, range(10)) is not a sequence but it is reversible, and so on) and then extracted and simplified the vslice because I thought it might be useful without the views stuff. (I also extracted and simplified it in a different way, as view slices that only work on sequences, and that actually did turn out to be occasionally useful.)
participants (13)
-
Alex Hall
-
Andrew Barnert
-
Chris Angelico
-
Christopher Barker
-
David Mertz
-
Greg Ewing
-
Jonathan Fine
-
Paul Moore
-
Ram Rachum
-
Rhodri James
-
Ricky Teachey
-
Stephen J. Turnbull
-
Steven D'Aprano