On May 9, 2020, at 19:43, Christopher Barker <pythonchb@gmail.com> wrote:

On Sat, May 9, 2020 at 1:03 PM Andrew Barnert <abarnert@yahoo.com> wrote:

I haven’t read the whole thing yet, but one thing immediately jumped out at me:

> and methods on containers, such as dict.keys return iterators in Python 3, 

No they don’t. They return views—objects that are collections in their own right (in particular, they’re not one-shot; they can be iterated over and over) but just delegate to another object rather than storing the data.

Thanks -- that's that kind of thing that led me to say that this is probably not ready for a PEP.

but I don't think that invalidates the idea at all -- there is debate about what an "islice" should return, but an iterable view would be a good option.

I don’t think it invalidates the basic idea at all, just that it suggests the design should be different.

Originally, dict returned lists for keys, values, and items. In 2.2, iterator variants were added. In 3.0, the list and iterator variants were both replaced with view versions, which were enough of an improvement that they were backported to 2.x. Because a view does cover almost all of the uses of both a sequence copy and an iterator. And I think the same is true here.

I'm inclined to think that it would be a bad idea to have it return a full sequence view object, and not sure it should do anything other than be iterable.

Why? What’s the downside to being able to do more with them for the same performance cost and only a little more up-front design work?

> And this is important here, because a view is what you ideally _want_. The reason range, key view, etc. are views rather than iterators isn’t that it’s easier to implement or explain or anything, it’s that it’s a little harder to implement and explain but so much more useful that it’s worth it. It’s something people take advantage of all the time in real code.

Maybe -- but "all the time?" I'd vernture to say that absolutiely the most comon thing done with, e.g. dict.keys() is to iterate over it.

Really? When I just want to iterate over a dict’s keys, I iterate the dict itself. 

> For prior art specifically on slicing as a view, rather than just views in general, see memoryview (which only works on buffers, not all sequences) and NumPy (which is weird in many ways, but people rely on slicing giving you a storage-sharing view)

I am a long-time numpy user, and yes, I very much take advantage of the memory sharing view.

But I do not think that that would be a good idea for the standard libary. numpy slices return a full-fledged numpy array, which shares a data view with the it's "host" -- this is really helpful for performance reasons -- moving large blocks of data around is expensive, but it's also pretty confusing. And it would be a lot more problematic with, e.g. lists, as the underlying buffer can be reallocated -- numpy arrays are mutable, but not re-sizable, once you've made one its data buffer does not change.

That’s no more of a problem for a list slice view than for any of the existing views. The simplest way to implement a view is to keep a reference to the underlying object and delegate to it, which is effectively what the dict views do.

(Well, did from 2.x to 3.5. The dict improvements in 3.6 opened up an optimization opportunity, because in the split layout a dict is effectively a wrapper around a keys view and a separate table, so the keys view can refer directly to that thing that already exists. But that isn’t relevant here.)

(You _could_ instead refuse to allow expanding a sequence when there’s a live view, as bytearray does with memoryview, but I don’t think that’s necessary here. It’s only needed there a consequence of the fact that the buffer protocol is provided in C rather than in Python. For a slice view, it would just make things more complicated and less functional for no good reason.)

> But just replacing islice is a much simpler task (mainly because the input has to be a sequence and the output is always a sequence, so the only complexity that arises is whether you want to allow mutable views into mutable sequences), and it may well be useful on its own.

Agreed. And while yes, dict_keys and friends are not JUST iterartors, they also aren't very functional views, either. They are not sequences, 

That’s not true. They are very functional—as functional as reasonably makes sense. The only reason they’re not Sequences is that they’re views on dicts, so indexing makes little sense, but set operations do—and they are in fact Sets. (Except for values.)

certainly not mutabe sequences.

Well, yes, but mutating a dict through its views wouldn’t make sense in the first place:

    >>> d = {1: 2}
    >>> k = dict.keys()
    >>> k |= 3

You’ve told it to add an item with key 3 without telling it what the value is, and there’s no reasonable thing that could mean. A slice view would have no such problem, so mutation is sensible.

That being said, mutation could easily be added later without breaking anything, and it does raise some nontrivial design issues (most obviously, notice that my implementation only allows non-size-changing mutations, because otherwise you have to decide whether it remains a view over seq[3:5] or becomes a view over seq[3:6]; all three options seem reasonable there, so I just went with the simplest, and have no good argument for why it’s the best…). So I think it might be better to leave mutation out of the original version anyway unless someone has a need to it (at which point we can use the examples to think through the best answers to the design issues).


> (in particular, they’re not one-shot; they can be iterated over and over)

yes, but they are only a single iterator -- if you call iter() on one you always get the same one back, and it's state is preserved.

No, that’s not true. Each call to iter() returns a completely independent iterator each time, with its own independent state that starts at the head of the view. It works exactly the same way as a set, a tuple, or any other normal collection:

    >>> d = {1: 2, 3: 4, 5: 6
    >>> k = d.keys()
    >>> i1 = iter(k)
    >>> next(i1)
    >>> i2 = iter(k)
    >>> next(i2)
    >>> list(i1)
    [3, 5]
    >>> next(i2)

(This was a bit harder to see, and to explain, before 3.6, because that order was intentionally arbitrary, but it was guaranteed to be consistent until you mutated the dict.)

Also notice, while the views’ iterators are just like dict iterators, and list iterators for that matter, in that they can’t handle the dict being resized during iteration, the views themselves have no such trouble:

    >>> d[7] = 8
    >>> next(i1)
    RuntimeError: dictionary changed size during iteration
    >>> i3 = iter(k)
    >>> next(i3)

Basically, views are not like iterators at all, except in that they save time and space by being lazy.

So yes, you can iterate over more than once, but iter() only resets after it's been exhausted before.

Such a resettable-iterator thing (which would have some precedent in file objects, I suppose) would actually be harder to Implement, on top of being less powerful and potentially confusing. And the same is true for slices.

In short -- not having thought about it deeply at all, but I'm thinking that making an SliceIterator very similar to dict_keys and friends would make a lot of sense.

Yes, as long that means being a full-featured normal collection (in this case a Sequence rather than a Set), not a resettable iterator.