Enable subscription operator for generator expressions

How about enabling subscription operator (`[]`) for generator expressions? Also for all `zip()`, `key()`, etc. They could be evaluated in the background only for the requested amount, to avoid evaluating the whole expression to something like a list or tuple, then indexed.

Although that is not a pattern I recall I had needed, but for the first item in a generator, I recognize it is more complicated than it should to be able to do that. However, not only that would be too big a change for all this objects I think one would expect an object providing index access with `[]` to also have a `len`. Also, see it as potentially making a lot of code error-prone: let's say one gets passed a generator where a sequence is expected. In current Python, if an item is accessed by index, one just get an explicit IndexError. If objects change to having indexes, two consecutive access to `gen[1]` will consume the generator and return different values. That could be very confusing. On the other hand, as I said, I can't come up with a simple pattern to get the nth item - so probably we should think of an easy and performant way. One way I can think of is to have a named parameter to the `next` built-in that would allow one to move forward more than one position. Say: `fith_element = next(gen, skip=4) ` and finally, one way I could think of retrieving the n element is: In [19]: a = (i for i in range(0, 100, 10)) In [20]: next(b for i, b in enumerate(a) if i==5) Out[20]: 50 It definitely feels like there should be a simpler way, but I just could not come up with it. On Tue, 17 Nov 2020 at 10:35, Nuri Jung <jnooree@snu.ac.kr> wrote:

On Tue, 17 Nov 2020 at 22:19, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
It's possible to write this yourself: from itertools import islice class ISlice: def __init__(self, it): self.it = iter(it) def __getitem__(self, s): if isinstance(s, slice): return islice(self.it, s.start, s.stop, s.step) # Presumably an integer return islice(self.it, s, s+1) if __name__ == "__main__": a = range(10) print(list(ISlice(a)[5:9])) a = range(10) print(list(ISlice(a)[5])) I don't know whether I'd find it useful enough to be worth it, though... Paul

On Tue, 17 Nov 2020 at 22:35, Paul Moore <p.f.moore@gmail.com> wrote:
I can write many things myself. That doesn't mean that it wouldn't be good if someone already wrote it for me (and for everyone else). In this case the islice function has already been written but it seems to have missed a trick by not using Python's compact slice notation. -- Oscar

On Wed, Nov 18, 2020 at 2:35 AM Joao S. O. Bueno <jsbueno@python.org.br> wrote:
Extremely confusing, and I think that would be enough to kill the idea.
The very concept of "the nth item" doesn't work with generators, so I think there's little reason to try to define it usefully. However...
(Ohh, the strong fifth element - Boron?)
... there may be some value in this simple "skip" option. For the record, the most normal way to do this sort of thing would be the islice function: https://docs.python.org/3/library/itertools.html#itertools.islice but if all you want to do is "skip the next four, then take the next one after that", it would be convenient to quickly pump the generator a few times before returning a value. This isn't something I often need personally, but I can definitely see the value of it. +0.5; this does get asked for a good bit, and a keyword argument on next() would be a lot less confusing than directly subscripting a generator. ChrisA

I agree with your detailed explanation, and it would be a great idea to add a keyword argument to the `next()` function. Just for reference, I believe C++ also has similar function, `std::next()` which advances 'iterators', and it also works on non-indexible (i.e. linked list, etc.) containers.

On Tue, Nov 17, 2020 at 03:42:54AM -0000, Nuri Jung wrote:
How about enabling subscription operator (`[]`) for generator expressions?
Generator expressions are iterators, and the iterator protocol is intentionally very simple. You only need to provide two things for an object to be an iterator: * a method `__iter__` that returns self; * a method `__next__` that returns the next generated value. While there is nothing that prevents people from adding extra functionality to their own custom iterator classes, the std lib generally keeps iterators pretty simple. We can talk about the practical difficulty of implementing such a thing without providing either a very confusing user experience or being exceeding memory inefficient, or both. Consider a generator comprehension: gen = (time.time() for i in itertools.cycle([None])) How would you jump ahead to see what `gen[1000]` is? Having jumped forward to `gen[1000]`, how do you jump back to give `gen[0]` without storing the entire sequence? The essence of subscription on sequences is that it gives random access to a sequence of items. Trying to force random access on arbitrary iterators that yield unpredictable values is hard. Efficiency of on-demand calculation and convenience of random access do not go well together. You can't have both except in very special circumstances, e.g. range objects. (Which are not iterators!) -- Steve

Although that is not a pattern I recall I had needed, but for the first item in a generator, I recognize it is more complicated than it should to be able to do that. However, not only that would be too big a change for all this objects I think one would expect an object providing index access with `[]` to also have a `len`. Also, see it as potentially making a lot of code error-prone: let's say one gets passed a generator where a sequence is expected. In current Python, if an item is accessed by index, one just get an explicit IndexError. If objects change to having indexes, two consecutive access to `gen[1]` will consume the generator and return different values. That could be very confusing. On the other hand, as I said, I can't come up with a simple pattern to get the nth item - so probably we should think of an easy and performant way. One way I can think of is to have a named parameter to the `next` built-in that would allow one to move forward more than one position. Say: `fith_element = next(gen, skip=4) ` and finally, one way I could think of retrieving the n element is: In [19]: a = (i for i in range(0, 100, 10)) In [20]: next(b for i, b in enumerate(a) if i==5) Out[20]: 50 It definitely feels like there should be a simpler way, but I just could not come up with it. On Tue, 17 Nov 2020 at 10:35, Nuri Jung <jnooree@snu.ac.kr> wrote:

On Tue, 17 Nov 2020 at 22:19, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
It's possible to write this yourself: from itertools import islice class ISlice: def __init__(self, it): self.it = iter(it) def __getitem__(self, s): if isinstance(s, slice): return islice(self.it, s.start, s.stop, s.step) # Presumably an integer return islice(self.it, s, s+1) if __name__ == "__main__": a = range(10) print(list(ISlice(a)[5:9])) a = range(10) print(list(ISlice(a)[5])) I don't know whether I'd find it useful enough to be worth it, though... Paul

On Tue, 17 Nov 2020 at 22:35, Paul Moore <p.f.moore@gmail.com> wrote:
I can write many things myself. That doesn't mean that it wouldn't be good if someone already wrote it for me (and for everyone else). In this case the islice function has already been written but it seems to have missed a trick by not using Python's compact slice notation. -- Oscar

On Wed, Nov 18, 2020 at 2:35 AM Joao S. O. Bueno <jsbueno@python.org.br> wrote:
Extremely confusing, and I think that would be enough to kill the idea.
The very concept of "the nth item" doesn't work with generators, so I think there's little reason to try to define it usefully. However...
(Ohh, the strong fifth element - Boron?)
... there may be some value in this simple "skip" option. For the record, the most normal way to do this sort of thing would be the islice function: https://docs.python.org/3/library/itertools.html#itertools.islice but if all you want to do is "skip the next four, then take the next one after that", it would be convenient to quickly pump the generator a few times before returning a value. This isn't something I often need personally, but I can definitely see the value of it. +0.5; this does get asked for a good bit, and a keyword argument on next() would be a lot less confusing than directly subscripting a generator. ChrisA

I agree with your detailed explanation, and it would be a great idea to add a keyword argument to the `next()` function. Just for reference, I believe C++ also has similar function, `std::next()` which advances 'iterators', and it also works on non-indexible (i.e. linked list, etc.) containers.

On Tue, Nov 17, 2020 at 03:42:54AM -0000, Nuri Jung wrote:
How about enabling subscription operator (`[]`) for generator expressions?
Generator expressions are iterators, and the iterator protocol is intentionally very simple. You only need to provide two things for an object to be an iterator: * a method `__iter__` that returns self; * a method `__next__` that returns the next generated value. While there is nothing that prevents people from adding extra functionality to their own custom iterator classes, the std lib generally keeps iterators pretty simple. We can talk about the practical difficulty of implementing such a thing without providing either a very confusing user experience or being exceeding memory inefficient, or both. Consider a generator comprehension: gen = (time.time() for i in itertools.cycle([None])) How would you jump ahead to see what `gen[1000]` is? Having jumped forward to `gen[1000]`, how do you jump back to give `gen[0]` without storing the entire sequence? The essence of subscription on sequences is that it gives random access to a sequence of items. Trying to force random access on arbitrary iterators that yield unpredictable values is hard. Efficiency of on-demand calculation and convenience of random access do not go well together. You can't have both except in very special circumstances, e.g. range objects. (Which are not iterators!) -- Steve
participants (7)
-
Chris Angelico
-
Joao S. O. Bueno
-
Marco Sulla
-
Nuri Jung
-
Oscar Benjamin
-
Paul Moore
-
Steven D'Aprano