Method to efficiently advance iterators for sequences that support random access

str_iterator, bytes_iterator, range_iterator, list_iterator, and tuple_iterator (and probably others) should have a method that is capable of efficiently advancing the iterator, instead of having to call next repeatedly. I suggest adding an itertools.advance function which dispatches to a dunder __advance__ method (if one exists) or, as a fallback, calls next repeatedly. Then, the iterators mentioned above (and any others capable of efficiently doing so) would implement __advance__, which would directly manipulate their index to efficiently "jump" the desired number of elements in constant-time rather than linear-time. For example, if you have a large list and want to iterate over it, but skip the first 50000 elements, you should be able to do something like: it = iter(mylist) itertools.advance(it, 50000) Note that you technically can do this with itertools.islice by supplying a start value, but itertools.islice effectively just repeatedly calls next on your behalf, so if you're skipping a lot of elements, it's unnecessarily slow. As a side note, I noticed that list_iterator has __setstate__ which can be used to (more or less) accomplish this, but that seems very hack-y. Although it is setting the index directly (rather than adding to it), so it'd be more awkward to use if the iterator is already partially exhausted. it = iter(mylist) it.__setstate__(50000)

You can use slice: new_iterator = iterator[50001:] it2 = iter(new_iterator) or range: for i in range(50001, len(iterator)): x = iterator[i]

On Tue, Oct 6, 2020 at 00:37 Kevin Mills <kevin.mills226@gmail.com> wrote:
Perhaps you can suggest an improvement to the `consume` recipe in the itertools documentation. As a side note, I noticed that list_iterator has __setstate__ which can be
I confess I don't get it. If the object is a list, then starting at slice n will be efficient. If the object is a more general iterable, you will need to count on `next` being called, because there may be side effects. Can you illuminate how your idea is better than list slicing in the first case, or accounts for side effects in the second?

Sorry for the duplicate message. I realized two seconds after I sent it, that I only replied to you and not the group. I didn't see the `consume` recipe until after I posted, or I probably would've mentioned it. What I want would have to be done in C, because `it_index` (as `listiterobject` and `unicodeiterobject` call it, not sure about the other iterators I mentioned) isn't exposed in Python-land, with the exception of the `__setstate__` workaround I mentioned before. Slicing large lists/strings/bytes/tuples is both memory and time inefficient. I don't think that `__setstate__` would even work with more general iterators, that's not what I was trying to say. `itertools.advance` (or maybe it should be called `itertools.consume` for consistency with the recipe) should use whatever is the most efficient way to advance the iterator without causing issues. Only iterators that are explicitly created tied to sequences that support random access would be able to efficiently jump forward; others would just call `next` repeatedly. `list_iterator` / `listiterobject` for example doesn't have to worry about losing any side-effects by jumping ahead. Does that answer your questions/concerns?

What I do not understand is why you need to use the iterator instead of using the iterable itself. This way you can jump to whatever position without slicing.

On Tue, Oct 6, 2020 at 10:14 AM Marco Sulla <Marco.Sulla.Python@gmail.com> wrote:
if you want the Nth item, that's easy, yes. if you want to iterate through items N to the end, then how do you do that without either iterating through the first N and throwing them away, or making a slice, which copies the rest of the sequence? Is this. big deal? maybe not -- islice is written in C, and pretty darn fast, but there is a missing efficiency here, which could be addressed by either providing access to set the iterator "counter", as proposed, or by providing a view object that could be a slice without copying. -CHB
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Tue, Oct 6, 2020 at 7:21 PM Christopher Barker <pythonchb@gmail.com> wrote:
```python for i in range(start, stop): x = lst[i] process(x) ``` The only problem is that there's slightly more execution in Python-land than in C-land, but that only matters if `process(x)` does very little and you're really concerned about performance. I can see how the proposal could be useful but only in very limited use cases.

On Tue, Oct 6, 2020 at 10:28 AM Alex Hall <alex.mojaki@gmail.com> wrote:
well yes, of course. but it's kind of a principle in Python that you don't use indices to iterate through a sequence :-) And I still like the sequence view idea, because then the creating of the "subview" could be done outside the iteration code, which would not have to know that it was working with a sequence, rather than any arbitrary iterable. -CHB
that is pretty much the same argument as using islice -- it will iterate through the first N elements, but at C speed ... Which is why I see this as "one more use for a sequence view", rather than enough of a motivator in itself. -CHB -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Wed, Oct 7, 2020 at 4:53 AM Christopher Barker <pythonchb@gmail.com> wrote:
Slices returning views would be elegant, but backward incompatible, so it might have to be something like lst.view[start:stop] to create a view. I'm not a fan of an "advance this iterator" feature (it'd need a dunder on the iterator to make it possible, and I don't see a lot of use for it), but perhaps some iterators could themselves support slicing. That would be completely backward compatible, since currently none of the iterators concerned have any support for indexing. If you had that, you could build your own advance() or consume() function, something like: def advance(it, n): try: return it[n:] except TypeError: return itertools.islice(it, n, None) Still unconvinced of its value, but a good use-case would sway me, and the C implementation for list_iterator wouldn't be too hard (it'd construct a new list_iterator with the same underlying list and an incremented index). ChrisA

This: def advance(it, n): try: return it[n:] except TypeError: return itertools.islice(it, n, None) has the disadvantages of: 1. Requiring a temporary copy of the data sliced (if len(it) is 1_000_000, and n is 500_000, you're stuck between 500_000 pointless __next__ calls, a 500_000 long temporary list or inefficiently looking up 500_000 elements by index) 2. Not working with sequence iterators, only sequences themselves (so if you want to read 1000, skip 1000, read 1000, skip 1000, over and over, you can't just use a single stateful iterator for the whole process without using the consume recipe and calling __next__ 1000 times for each skip) 3. Not working with all sorts of things where a dedicated advance implementation would allow efficient skipping, e.g. the new insertion ordered dicts when no deletions have been performed (a state we already track for other reasons) can advance the iterator (for the raw dict, keys, values or items) with the same efficiency as sequences do (and could save a lot of work building and tossing tuples iterating .items() even if it can't assume no dummy entries); the various itertools functions like combinations, permutations and combinations_with_replacement (and in some cases, product) could also be advanced efficiently without the expense of generating the intermediate states. Point is, the OP's case (a single sequence, advanced exactly once) is the only one that implementation addresses, and it still has scaling issues even then. On Tue, Oct 6, 2020 at 6:21 PM Chris Angelico <rosuav@gmail.com> wrote:

I’m still waiting for an example of a real app where this matters. -- --Guido (mobile)

So, Sequence views that do direct addressing with doubly-linked lists? https://docs.python.org/3/library/collections.abc.html#collections.abc.Seque... https://docs.python.org/3/library/stdtypes.html#dict-views :
a.view(ndarray_subclass) or a.view(type=ndarray_subclass) just returns an instance of ndarray_subclass that looks at the same array (same shape,
You may be looking for (directly-addressable) NumPy arrays? https://numpy.org/doc/stable/reference/generated/numpy.array.html https://numpy.org/doc/stable/reference/generated/numpy.ndarray.view.html : dtype, etc.) This does not cause a reinterpretation of the memory. On Tue, Oct 6, 2020, 1:35 PM Alex Hall <alex.mojaki@gmail.com> wrote:

arrow::Buffer can be zero-copy sliced to permit Buffers to cheaply reference other Buffers, while preserving memory lifetime and clean
Arrow Buffers, memoryview, array.array Apache Arrow Buffers support zero-copy slicing: parent-child relationships.
There are many implementations of arrow::Buffer, but they all provide a
standard interface: a data pointer and length. This is similar to Python’s built-in buffer protocol and memoryview objects.
A Buffer can be created from any Python object implementing the buffer
protocol by calling the py_buffer() function. https://arrow.apache.org/docs/python/memory.html#pyarrow-buffer https://docs.python.org/3/library/stdtypes.html#memoryview : the view. For higher dimensions, the length is equal to the length of the nested list representation of the view. The itemsize attribute will give you the number of bytes in a single element.
A memoryview supports slicing and indexing to expose its data.
One-dimensional slicing will result in a subview: https://docs.python.org/3/c-api/memoryview.html https://docs.python.org/3/library/array.html#module-array On Tue, Oct 6, 2020, 1:56 PM Wes Turner <wes.turner@gmail.com> wrote:

On Tue, Oct 6, 2020, 1:21 PM Christopher Barker
it = (lst[i] for i in range(N, len(lst))) I haven't benchmarked whether this is faster than islice. It might depend on how many you wind up consuming. It's slightly cumbersome to write, I suppose. But it also seems like something one RARELY needs.

I am +0.3 on this as I don't personally have a need for this but do see the utility. I can think of a number of examples where an `__advance__` would be preferable to any of the proposed solutions: A skip list which doesn't support O(1) random access but can advance faster than naively calling next repeatedly A lazy infinite iterator which can efficiently calculate its state at some future point (e.g. `itertools.count`) A tree iterator could perform efficient `__advance__` by checking the size of sub trees before descending into them (I'm not sure why you would ever want to `__advance__` a tree iterator). My ladder two examples demonstrate that this could have utility outside of sequences but for iterators in general. -- Caleb Donovick On Tue, Oct 6, 2020 at 1:13 PM David Mertz <mertz@gnosis.cx> wrote:

On Tue, Oct 06, 2020 at 02:27:54PM -0700, Caleb Donovick wrote:
For `__advance__` to be an official Python protocol, it would almost certainly have to be of use for *general purpose iterators*, not just specialised ones -- and probably not *hypothetical* iterators which may not even exist. Do you have concrete examples of your skip list and tree iterators that are in wide-spread use? Specialised iterators can create whatever extra APIs they want to support, but the official iterator protocol intentionally has a very basic API: - anything with an `__iter__` method which returns itself; - and a `__next__` method that returns the next value, raising StopIteration when exhausted. This is a bare minimum needed to make an iterator, and we like it that way. For starters, it means that generators are iterators. If people want to supply objects that support the iterator protocol but also offer a rich API including: - peek - restart - previous - jump ahead (advance) all features that have been proposed, there is nothing stopping you from adding those features to your iterator classes. But they all have problems if considered to be necessary for *all* iterators. I would expect that, given a sufficiently compelling real-world use-case, we would be prepared to add a jump ahead method to list-iterators, as a specific feature of that iterator, not of all iterators.
What's your use case for advancing a count object, rather than just creating a new one? it = itertools.count() # start at 0 process(it) # process some values it.advance(state) # jump forward process(it) # process some more values as opposed to what is already possible: it = itertools.count() process(it) it = itertools.count(state) process(it) Real-world use-cases for this feature are far more useful than contrived and artifical use-cases unlikely to ever occur in real code.
My ladder two examples demonstrate that this could have utility outside of sequences but for iterators in general.
I'm sorry, I don't know what your ladder two examples are. Did you post them in another thread? -- Steve

On Tue, Oct 6, 2020 at 6:16 PM Steven D'Aprano <steve@pearwood.info> wrote:
I think that was an auto-correct for "latter" two examples, i.e. the last two he gave. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Tue, Oct 6, 2020 at 18:16 Steven D'Aprano <steve@pearwood.info> wrote:
Yeah, I’m still waiting for the real use case to be revealed. (There my well be one.) Specialised iterators can create whatever extra APIs they want to
There’s a precedent though, __length_hint__ (PEP 424). The OP anticipated this, with “[a] function which dispatches to a dunder __advance__ method (if one exists) or, as a fallback, calls next repeatedly.” Clearly the function would be on a par with len() and next(). So it would be optional to implement but the operation would be available for all iterators (even generators). If people want to supply objects that support the iterator protocol but also offer a rich API including:
Strawman, since all except advance() would require some kind of buffering to build them out of the basics. I would expect that, given a sufficiently compelling real-world
But the advance() function could support all iterators using the OP’s fallback. —Guido -- --Guido (mobile)

On Tue, Oct 06, 2020 at 10:25:01PM -0700, Guido van Rossum wrote:
Yes that's the critical thing -- we could expose the internal state of the list operator if it was desirable. It is already exposed as a (private?) dunder: py> it = iter([10, 20, 30, 40, 50]) py> next(it) 10 py> it.__setstate__(3) py> next(it) 40 py> it.__setstate__(0) py> next(it) 10 but once we expose it, we can't easily change our mind again. So it is reasonable to be a bit cautious about locking this interface in as a public feature. (Aside: I'm actually rather surprised that it's exposed as a dunder.)
Sure, and I appreciate that we could offer an O(N) fallback. Your point is well taken. There's precedent with the `in` operator too, which falls back on iteration and equality if no `__contains__` method is defined. I'm not sure that either len or next are good precedents? As far as I can tell, len() does not call `__length_hint__`, and next() only dispatches to `__next__`.
It's hardly a strawman when people actually have requested each of those as extensions to the iterator protocol! Don't make me go hunting for references :-) As for the buffering issue, sure, that's a point against those proposals, but itertools provides a tee function that buffers the iterator. So "needs a buffer" is not necessarily a knock-down objection to these features, even for the std lib.
Sure. We could do that. What's the interface? Is this a skip ahead by N steps, or skip directly to state N? I can imagine uses for both. Can we skip backwards if the underlying list supports it? `listiter.__setstate__` supports the second interface. There's no getstate dunder that I can see. Should there be? Here's a cautionary tale to suggest some caution. Back in the days when Python's PRNG was Wichmann-Hill, we added a `jumpahead(n)` interface to step forward n steps more efficiently than just calling random n times. This lasted exactly two releases, 2.1 and 2.2, before the PRNG changed and stepping forward n steps efficiently was no longer possible, and the method changed to essentially ignore n and just jump ahead to some distant state. https://docs.python.org/release/2.3/lib/module-random.html I'm not arguing against this proposal, or for it. I'm just mentioning some considerations which should be considered :-) -- Steve

On Wed, Oct 7, 2020 at 2:13 AM Steven D'Aprano <steve@pearwood.info> wrote:
[about `__setstate__`] (Aside: I'm actually rather surprised that it's exposed as a dunder.)
It's used for pickling. Someone long ago must have complained that list iterators weren't picklable, and we complied. I'm not sure that either len or next are good precedents? As far as I
can tell, len() does not call `__length_hint__`, and next() only dispatches to `__next__`.
I just meant that these are examples of a common pattern in Python, of a *function* wrapping a dunder *method*. Your example (`in` -> `__contains__` with a fallback if that doesn't exist) is better because it shows that a fallback is a known pattern; but it isn't exactly a function. As for the buffering issue, sure, that's a point against those
Well, the buffering requires forethought (you can't call go back unless you had the forethought to set up a buffer first) and consumes memory (which iterators are meant to avoid) so the argument against these is much stronger, and different from the argument against advance() -- the latter's presence costs nothing unless you call it.
What's the interface? Is this a skip ahead by N steps, or skip directly to state N? I can imagine uses for both.
Not all iterators remember how often next() was called, so "skip to state N" is not a reasonable API. The only reasonable thing advance(N) can promise is to be equivalent to calling next() N times.
Can we skip backwards if the underlying list supports it?
We shouldn't allow this, since it wouldn't work if the input iterator was changed from a list iterator to e.g. a generator.
`listiter.__setstate__` supports the second interface. There's no getstate dunder that I can see. Should there be?
It's called `__reduce__`. These are used for pickling and the state they pass around is supposed to be opaque.
Here's a cautionary tale to suggest some caution. [...]
I guess the worst that could happen in our case is that some class used to be implemented on top of a list and at some point changed to a linked list, and the performance of advance(N) changed from O(1) to O(N). But that's not going to happen to Python's fundamental data types (list, tuple, bytes, str, array), since (for better or for worse) they have many other aspects of their API (notably indexing and slicing) that would change from O(1) to O(N) if the implementation changed to something other than an array. I'm not arguing against this proposal, or for it. I'm just mentioning
some considerations which should be considered :-)
Same here, for sure. Still waiting for that real-world use case... :-) -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

__setstate__, a generic __getstate__, listiter.__setstate__, (typed) arrays and memoryviews ```python import collections.abc from array import array list_ = [0, 10, 20] assert [list_[i] for i in range(len(list_))] == [0, 10, 20] iterator = iter(list_) assert [next(iterator) for n in range(2)] == [0, 10] iterator = iter(list_) assert iterator.__reduce__() == (iter, (list_,), 0) assert next(iterator) == 0 assert iterator.__reduce__() == (iter, (list_,), 1) assert next(iterator) == 10 assert iterator.__reduce__() == (iter, (list_,), 2) iterator.__setstate__(0) assert iterator.__reduce__() == (iter, (list_,), 0) assert next(iterator) == 0 assert next(iterator) == 10 assert next(iterator) == 20 assert iterator.__reduce__() == (iter, (list_,), 3) assert iterator.__reduce__() == (iter, (list_,), len(list_)) try: next(iterator) except StopIteration: pass assert iterator.__reduce__() == (iter, ([],)) iterator.__setstate__(1) try: assert next(iterator) == 10 except StopIteration: pass iterator = iter(list_) iterator.__setstate__(1) assert next(iterator) == 10 assert iterator.__reduce__() == (iter, (list_,), 2) try: [1, 2, 3].__reduce__() [1, 2, 3].__reduce_ex__(0) [1, 2, 3].__reduce_ex__(1) except TypeError as e: assert e.args[0] == "can't pickle list objects" [1, 2, 3].__reduce_ex__(2) def __getstate__(obj): if (not isinstance(obj, collections.abc.Iterable) or isinstance(obj, list)): raise TypeError('__getstate__ only works with iterables', type(obj)) reducefunc = getattr(obj, '__reduce__ex__', False) reduceoutput = reducefunc(2) if reducefunc else obj.__reduce__() if len(reduceoutput) < 3: raise StopIteration # ? return reduceoutput[2] iterator = iter(list_) assert __getstate__(iterator) == 0 next(iterator) assert __getstate__(iterator) == 1 next(iterator) assert __getstate__(iterator) == 2 next(iterator) assert __getstate__(iterator) == 3 try: next(iterator) except StopIteration: pass try: __getstate__(iterator) except StopIteration: pass iterator = iter(list_) assert __getstate__(iterator) == 0 assert next(iterator) == 0 assert __getstate__(iterator) == 1 iterator.__setstate__(0) assert __getstate__(iterator) == 0 assert next(iterator) == 0 assert __getstate__(iterator) == 1 try: __getstate__([1, 2, 3]) except TypeError as e: assert e.args[0] == "__getstate__ only works with iterables" assert e.args[1] == list, e.args[1] # list_iterator; type(iter(list())) pass # arrays must be typed; # otherwise random access isn't possible # because skipping ahead or back by n*size requires n calls to size(array[n]) list_ary = array('i', list_) iterator = iter(list_ary) assert [next(iterator) for n in range(2)] == [0, 10] ary_memoryview = memoryview(list_ary) iterator = iter(ary_memoryview) assert [next(iterator) for n in range(2)] == [0, 10] assert ary_memoryview.obj == list_ary assert ary_memoryview.tolist() == list_ assert ary_memoryview[1] == 10 ary_memoryview[1] = 100 assert ary_memoryview[1] == 100 assert list_ary[1] == 100 assert ary_memoryview[:2].tolist() == [0, 100] list_ary[1] = 1000 assert ary_memoryview[1] == 1000 assert ary_memoryview[:2].tolist() == [0, 1000] ary_memoryview.release() try: ary_memoryview[:2].tolist() except ValueError as e: assert e.args[0] == "operation forbidden on released memoryview object" list_ = [0, 10, 20] iterable = iter(list_) assert next(iterable) == 0 list_.insert(1, 5) assert next(iterable) == 5 ``` - https://docs.python.org/3/library/pickle.html#object.__setstate__ - listiter_setstate: https://github.com/python/cpython/blob/v3.10.0a1/Objects/listobject.c#L3215-... : ```c static PyObject * listiter_setstate(listiterobject *it, PyObject *state) { Py_ssize_t index = PyLong_AsSsize_t(state); if (index == -1 && PyErr_Occurred()) return NULL; if (it->it_seq != NULL) { if (index < 0) index = 0; else if (index > PyList_GET_SIZE(it->it_seq)) index = PyList_GET_SIZE(it->it_seq); /* iterator exhausted */ it->it_index = index; } Py_RETURN_NONE; } ```c On Wed, Oct 7, 2020 at 11:32 AM Guido van Rossum <guido@python.org> wrote:

"I am +0.3 on this as I don't personally have a need" " (I'm not sure why you would ever want to `__advance__` a tree iterator)"
What's your use case for advancing a count object, rather than just creating a new one?
Itertools.count was an example (hence the use of "e.g.") of an iterator which can be efficiently advanced without producing intermediate state. Clearly anyone can advance it manually. My point is that an iterator may have an efficient way to calculate its state some point in the future without needing to calculate the intermediate state. For example the fibonacci sequence has a closed form formula for the nth element and hence could be advanced efficiently. I realize my original examples were contrived, but I have a better one. Consider the task of collecting TCP headers from an iterator of bytes: ``` def get_tcp_headers(stream: Iterator[Byte]): while stream: # Move to the total length field of the IP header stream.advance(2) # record the total length (two bytes) total_length = ... # skip the rest of IP header stream.advance(28) # record the TCP header header = ... yield header stream.advance(total_length - 32 - len(header)) ``` Maybe Kevin can tell us what motivated him to post this idea but I can see many places in parsing where you might want to skip arbitrary portions of a stream. The beauty of iterators is you don't need to be concerned with the underlying data structure. Ideally I shouldn't need to write two versions of some parse function one which operates on sequences and one that operates on iterables, just so I can efficiently `advance` the sequences. -- Caleb Donovick On Tue, Oct 6, 2020 at 6:16 PM Steven D'Aprano <steve@pearwood.info> wrote:

Addendum to my example: If my get_tcp_header was made a class it would also be possible for it to support the `__advance__` protocol: ``` class TCPHeaderIter(Iterator[TCPHeader]): def __init__(self, stream: Iterator[Byte]): self.stream = stream def __next__(self) -> TCPHeader: # similar to the body of the while loop def __advance__(self, n): for _ in range(n): self.stream.advance(2) total_length = ... self.stream.advance(total_length - 4) ``` Now I don't have a use case for `__advance__` on a TCP header iterator but one might want to sample every N'th header. On Wed, Oct 7, 2020 at 3:06 PM Caleb Donovick <donovick@cs.stanford.edu> wrote:

On Wed, Oct 7, 2020 at 6:24 PM Caleb Donovick <donovick@cs.stanford.edu> wrote:
Yes, this is technically an example. But this doesn't get us any closer to a real-world use case. If you want an iterator than counts from N, the spelling `count(N)` exists now. If you want to starting counting N elements later than wherever you are now, I guess do: new_count = counter(next(old_cound) + N) For example the fibonacci sequence has a closed
form formula for the nth element and hence could be advanced efficiently.
Sure. And even more relevantly, if you want the Nth Fibonacci you can write a closed-form function `nth_fib()` to get it. This is toy examples where *theoretically* a new magic method could be used, but it's not close to a use case that would motivate changing the language. ```
This is BARELY more plausible as a real-world case. Throwing away 28 bytes with a bunch of next() calls is completely trivial in time. A case where some implementation could conceivably save measurable time would require skipping 100s of thousands or millions of next() calls... and probably calls that actually took some work to compute to matter, even there. What you'd need to motivate the new API is a case where you might skip a million items in an iterator, and yet the million-and-first item is computable without computing all the others. Ideally something where each of those million calls does something more than just copy a byte from a kernel buffer. I don't know that such a use case does not exist, but nothing comes to my mind, and no one has suggested one in this thread. Otherwise, itertools.islice() completely covers the situation already. -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

Seeing as an IP packet could in theory be as large 64K `stream.advance(total_length - 32 - len(header))` could be skipping ~65,000 next calls. (Although in practice packets are very unlikely to exceed 1,500 bytes because of the ethernet standard.) In any event avoiding ~1500 next calls per packet is hardly insignificant if you want to process more than handful of packet. Now a better rejection to my example is that this sort of code does not belong in python and should be in a systems language, an argument I would agree with. -- Caleb Donovick On Wed, Oct 7, 2020 at 3:38 PM David Mertz <mertz@gnosis.cx> wrote:

Yes on systems language. But also, why read IP packets as a bytes iterator? Reading the whole 64k into a bytes object is trivial memory, and then it's random access anyway. I was trying to construct a scenario where this could be faster. Maybe the network is VERY slow and the whole bigger isn't available at once. But then .advance() still has to block on the bytes anyway. I suppose you can invent a story where the first 32k bytes of each packet are fast on the network, them the other 32k bytes are slow. It feels like you have to fine-tune a very unusual scenario to produce even minor benefit. On Wed, Oct 7, 2020, 7:06 PM Caleb Donovick <donovick@cs.stanford.edu> wrote:

I suspect you will encounter a bit of resistence here, because itertools is about, well, itertors, and designbed to work with arbitrary iterables, and what you are looking for is somnethign optimized for Sequences. Which doesn't mean it couldn't be there.
Slicing large lists/strings/bytes/tuples is both memory and time inefficient.
Exactly. Which is why I'd like to have a "view slice", and or more general sequence view object, as outlined here: https://github.com/PythonCHB/islice-pep/blob/master/pep-xxx-islice.rst and discussed on this list here: https://mail.python.org/archives/list/python-ideas@python.org/message/LEMV3W... (from four years ago -- which I only just noticed!) And more recently: https://mail.python.org/archives/list/python-ideas@python.org/message/U35BS5... That kinda petered out, with from what I could tell, little (no) interest from any core devs. But this is another motivating use case -- so time to revive it? BTW -- see above, Serhiy Storchak, from four years ago, proposed a "sequence tools" module -- which would address the "problem" of adding things to itertools that are only useful for Sequences. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

You can use slice: new_iterator = iterator[50001:] it2 = iter(new_iterator) or range: for i in range(50001, len(iterator)): x = iterator[i]

On Tue, Oct 6, 2020 at 00:37 Kevin Mills <kevin.mills226@gmail.com> wrote:
Perhaps you can suggest an improvement to the `consume` recipe in the itertools documentation. As a side note, I noticed that list_iterator has __setstate__ which can be
I confess I don't get it. If the object is a list, then starting at slice n will be efficient. If the object is a more general iterable, you will need to count on `next` being called, because there may be side effects. Can you illuminate how your idea is better than list slicing in the first case, or accounts for side effects in the second?

Sorry for the duplicate message. I realized two seconds after I sent it, that I only replied to you and not the group. I didn't see the `consume` recipe until after I posted, or I probably would've mentioned it. What I want would have to be done in C, because `it_index` (as `listiterobject` and `unicodeiterobject` call it, not sure about the other iterators I mentioned) isn't exposed in Python-land, with the exception of the `__setstate__` workaround I mentioned before. Slicing large lists/strings/bytes/tuples is both memory and time inefficient. I don't think that `__setstate__` would even work with more general iterators, that's not what I was trying to say. `itertools.advance` (or maybe it should be called `itertools.consume` for consistency with the recipe) should use whatever is the most efficient way to advance the iterator without causing issues. Only iterators that are explicitly created tied to sequences that support random access would be able to efficiently jump forward; others would just call `next` repeatedly. `list_iterator` / `listiterobject` for example doesn't have to worry about losing any side-effects by jumping ahead. Does that answer your questions/concerns?

What I do not understand is why you need to use the iterator instead of using the iterable itself. This way you can jump to whatever position without slicing.

On Tue, Oct 6, 2020 at 10:14 AM Marco Sulla <Marco.Sulla.Python@gmail.com> wrote:
if you want the Nth item, that's easy, yes. if you want to iterate through items N to the end, then how do you do that without either iterating through the first N and throwing them away, or making a slice, which copies the rest of the sequence? Is this. big deal? maybe not -- islice is written in C, and pretty darn fast, but there is a missing efficiency here, which could be addressed by either providing access to set the iterator "counter", as proposed, or by providing a view object that could be a slice without copying. -CHB
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Tue, Oct 6, 2020 at 7:21 PM Christopher Barker <pythonchb@gmail.com> wrote:
```python for i in range(start, stop): x = lst[i] process(x) ``` The only problem is that there's slightly more execution in Python-land than in C-land, but that only matters if `process(x)` does very little and you're really concerned about performance. I can see how the proposal could be useful but only in very limited use cases.

On Tue, Oct 6, 2020 at 10:28 AM Alex Hall <alex.mojaki@gmail.com> wrote:
well yes, of course. but it's kind of a principle in Python that you don't use indices to iterate through a sequence :-) And I still like the sequence view idea, because then the creating of the "subview" could be done outside the iteration code, which would not have to know that it was working with a sequence, rather than any arbitrary iterable. -CHB
that is pretty much the same argument as using islice -- it will iterate through the first N elements, but at C speed ... Which is why I see this as "one more use for a sequence view", rather than enough of a motivator in itself. -CHB -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Wed, Oct 7, 2020 at 4:53 AM Christopher Barker <pythonchb@gmail.com> wrote:
Slices returning views would be elegant, but backward incompatible, so it might have to be something like lst.view[start:stop] to create a view. I'm not a fan of an "advance this iterator" feature (it'd need a dunder on the iterator to make it possible, and I don't see a lot of use for it), but perhaps some iterators could themselves support slicing. That would be completely backward compatible, since currently none of the iterators concerned have any support for indexing. If you had that, you could build your own advance() or consume() function, something like: def advance(it, n): try: return it[n:] except TypeError: return itertools.islice(it, n, None) Still unconvinced of its value, but a good use-case would sway me, and the C implementation for list_iterator wouldn't be too hard (it'd construct a new list_iterator with the same underlying list and an incremented index). ChrisA

This: def advance(it, n): try: return it[n:] except TypeError: return itertools.islice(it, n, None) has the disadvantages of: 1. Requiring a temporary copy of the data sliced (if len(it) is 1_000_000, and n is 500_000, you're stuck between 500_000 pointless __next__ calls, a 500_000 long temporary list or inefficiently looking up 500_000 elements by index) 2. Not working with sequence iterators, only sequences themselves (so if you want to read 1000, skip 1000, read 1000, skip 1000, over and over, you can't just use a single stateful iterator for the whole process without using the consume recipe and calling __next__ 1000 times for each skip) 3. Not working with all sorts of things where a dedicated advance implementation would allow efficient skipping, e.g. the new insertion ordered dicts when no deletions have been performed (a state we already track for other reasons) can advance the iterator (for the raw dict, keys, values or items) with the same efficiency as sequences do (and could save a lot of work building and tossing tuples iterating .items() even if it can't assume no dummy entries); the various itertools functions like combinations, permutations and combinations_with_replacement (and in some cases, product) could also be advanced efficiently without the expense of generating the intermediate states. Point is, the OP's case (a single sequence, advanced exactly once) is the only one that implementation addresses, and it still has scaling issues even then. On Tue, Oct 6, 2020 at 6:21 PM Chris Angelico <rosuav@gmail.com> wrote:

I’m still waiting for an example of a real app where this matters. -- --Guido (mobile)

So, Sequence views that do direct addressing with doubly-linked lists? https://docs.python.org/3/library/collections.abc.html#collections.abc.Seque... https://docs.python.org/3/library/stdtypes.html#dict-views :
a.view(ndarray_subclass) or a.view(type=ndarray_subclass) just returns an instance of ndarray_subclass that looks at the same array (same shape,
You may be looking for (directly-addressable) NumPy arrays? https://numpy.org/doc/stable/reference/generated/numpy.array.html https://numpy.org/doc/stable/reference/generated/numpy.ndarray.view.html : dtype, etc.) This does not cause a reinterpretation of the memory. On Tue, Oct 6, 2020, 1:35 PM Alex Hall <alex.mojaki@gmail.com> wrote:

arrow::Buffer can be zero-copy sliced to permit Buffers to cheaply reference other Buffers, while preserving memory lifetime and clean
Arrow Buffers, memoryview, array.array Apache Arrow Buffers support zero-copy slicing: parent-child relationships.
There are many implementations of arrow::Buffer, but they all provide a
standard interface: a data pointer and length. This is similar to Python’s built-in buffer protocol and memoryview objects.
A Buffer can be created from any Python object implementing the buffer
protocol by calling the py_buffer() function. https://arrow.apache.org/docs/python/memory.html#pyarrow-buffer https://docs.python.org/3/library/stdtypes.html#memoryview : the view. For higher dimensions, the length is equal to the length of the nested list representation of the view. The itemsize attribute will give you the number of bytes in a single element.
A memoryview supports slicing and indexing to expose its data.
One-dimensional slicing will result in a subview: https://docs.python.org/3/c-api/memoryview.html https://docs.python.org/3/library/array.html#module-array On Tue, Oct 6, 2020, 1:56 PM Wes Turner <wes.turner@gmail.com> wrote:

On Tue, Oct 6, 2020, 1:21 PM Christopher Barker
it = (lst[i] for i in range(N, len(lst))) I haven't benchmarked whether this is faster than islice. It might depend on how many you wind up consuming. It's slightly cumbersome to write, I suppose. But it also seems like something one RARELY needs.

I am +0.3 on this as I don't personally have a need for this but do see the utility. I can think of a number of examples where an `__advance__` would be preferable to any of the proposed solutions: A skip list which doesn't support O(1) random access but can advance faster than naively calling next repeatedly A lazy infinite iterator which can efficiently calculate its state at some future point (e.g. `itertools.count`) A tree iterator could perform efficient `__advance__` by checking the size of sub trees before descending into them (I'm not sure why you would ever want to `__advance__` a tree iterator). My ladder two examples demonstrate that this could have utility outside of sequences but for iterators in general. -- Caleb Donovick On Tue, Oct 6, 2020 at 1:13 PM David Mertz <mertz@gnosis.cx> wrote:

On Tue, Oct 06, 2020 at 02:27:54PM -0700, Caleb Donovick wrote:
For `__advance__` to be an official Python protocol, it would almost certainly have to be of use for *general purpose iterators*, not just specialised ones -- and probably not *hypothetical* iterators which may not even exist. Do you have concrete examples of your skip list and tree iterators that are in wide-spread use? Specialised iterators can create whatever extra APIs they want to support, but the official iterator protocol intentionally has a very basic API: - anything with an `__iter__` method which returns itself; - and a `__next__` method that returns the next value, raising StopIteration when exhausted. This is a bare minimum needed to make an iterator, and we like it that way. For starters, it means that generators are iterators. If people want to supply objects that support the iterator protocol but also offer a rich API including: - peek - restart - previous - jump ahead (advance) all features that have been proposed, there is nothing stopping you from adding those features to your iterator classes. But they all have problems if considered to be necessary for *all* iterators. I would expect that, given a sufficiently compelling real-world use-case, we would be prepared to add a jump ahead method to list-iterators, as a specific feature of that iterator, not of all iterators.
What's your use case for advancing a count object, rather than just creating a new one? it = itertools.count() # start at 0 process(it) # process some values it.advance(state) # jump forward process(it) # process some more values as opposed to what is already possible: it = itertools.count() process(it) it = itertools.count(state) process(it) Real-world use-cases for this feature are far more useful than contrived and artifical use-cases unlikely to ever occur in real code.
My ladder two examples demonstrate that this could have utility outside of sequences but for iterators in general.
I'm sorry, I don't know what your ladder two examples are. Did you post them in another thread? -- Steve

On Tue, Oct 6, 2020 at 6:16 PM Steven D'Aprano <steve@pearwood.info> wrote:
I think that was an auto-correct for "latter" two examples, i.e. the last two he gave. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Tue, Oct 6, 2020 at 18:16 Steven D'Aprano <steve@pearwood.info> wrote:
Yeah, I’m still waiting for the real use case to be revealed. (There my well be one.) Specialised iterators can create whatever extra APIs they want to
There’s a precedent though, __length_hint__ (PEP 424). The OP anticipated this, with “[a] function which dispatches to a dunder __advance__ method (if one exists) or, as a fallback, calls next repeatedly.” Clearly the function would be on a par with len() and next(). So it would be optional to implement but the operation would be available for all iterators (even generators). If people want to supply objects that support the iterator protocol but also offer a rich API including:
Strawman, since all except advance() would require some kind of buffering to build them out of the basics. I would expect that, given a sufficiently compelling real-world
But the advance() function could support all iterators using the OP’s fallback. —Guido -- --Guido (mobile)

On Tue, Oct 06, 2020 at 10:25:01PM -0700, Guido van Rossum wrote:
Yes that's the critical thing -- we could expose the internal state of the list operator if it was desirable. It is already exposed as a (private?) dunder: py> it = iter([10, 20, 30, 40, 50]) py> next(it) 10 py> it.__setstate__(3) py> next(it) 40 py> it.__setstate__(0) py> next(it) 10 but once we expose it, we can't easily change our mind again. So it is reasonable to be a bit cautious about locking this interface in as a public feature. (Aside: I'm actually rather surprised that it's exposed as a dunder.)
Sure, and I appreciate that we could offer an O(N) fallback. Your point is well taken. There's precedent with the `in` operator too, which falls back on iteration and equality if no `__contains__` method is defined. I'm not sure that either len or next are good precedents? As far as I can tell, len() does not call `__length_hint__`, and next() only dispatches to `__next__`.
It's hardly a strawman when people actually have requested each of those as extensions to the iterator protocol! Don't make me go hunting for references :-) As for the buffering issue, sure, that's a point against those proposals, but itertools provides a tee function that buffers the iterator. So "needs a buffer" is not necessarily a knock-down objection to these features, even for the std lib.
Sure. We could do that. What's the interface? Is this a skip ahead by N steps, or skip directly to state N? I can imagine uses for both. Can we skip backwards if the underlying list supports it? `listiter.__setstate__` supports the second interface. There's no getstate dunder that I can see. Should there be? Here's a cautionary tale to suggest some caution. Back in the days when Python's PRNG was Wichmann-Hill, we added a `jumpahead(n)` interface to step forward n steps more efficiently than just calling random n times. This lasted exactly two releases, 2.1 and 2.2, before the PRNG changed and stepping forward n steps efficiently was no longer possible, and the method changed to essentially ignore n and just jump ahead to some distant state. https://docs.python.org/release/2.3/lib/module-random.html I'm not arguing against this proposal, or for it. I'm just mentioning some considerations which should be considered :-) -- Steve

On Wed, Oct 7, 2020 at 2:13 AM Steven D'Aprano <steve@pearwood.info> wrote:
[about `__setstate__`] (Aside: I'm actually rather surprised that it's exposed as a dunder.)
It's used for pickling. Someone long ago must have complained that list iterators weren't picklable, and we complied. I'm not sure that either len or next are good precedents? As far as I
can tell, len() does not call `__length_hint__`, and next() only dispatches to `__next__`.
I just meant that these are examples of a common pattern in Python, of a *function* wrapping a dunder *method*. Your example (`in` -> `__contains__` with a fallback if that doesn't exist) is better because it shows that a fallback is a known pattern; but it isn't exactly a function. As for the buffering issue, sure, that's a point against those
Well, the buffering requires forethought (you can't call go back unless you had the forethought to set up a buffer first) and consumes memory (which iterators are meant to avoid) so the argument against these is much stronger, and different from the argument against advance() -- the latter's presence costs nothing unless you call it.
What's the interface? Is this a skip ahead by N steps, or skip directly to state N? I can imagine uses for both.
Not all iterators remember how often next() was called, so "skip to state N" is not a reasonable API. The only reasonable thing advance(N) can promise is to be equivalent to calling next() N times.
Can we skip backwards if the underlying list supports it?
We shouldn't allow this, since it wouldn't work if the input iterator was changed from a list iterator to e.g. a generator.
`listiter.__setstate__` supports the second interface. There's no getstate dunder that I can see. Should there be?
It's called `__reduce__`. These are used for pickling and the state they pass around is supposed to be opaque.
Here's a cautionary tale to suggest some caution. [...]
I guess the worst that could happen in our case is that some class used to be implemented on top of a list and at some point changed to a linked list, and the performance of advance(N) changed from O(1) to O(N). But that's not going to happen to Python's fundamental data types (list, tuple, bytes, str, array), since (for better or for worse) they have many other aspects of their API (notably indexing and slicing) that would change from O(1) to O(N) if the implementation changed to something other than an array. I'm not arguing against this proposal, or for it. I'm just mentioning
some considerations which should be considered :-)
Same here, for sure. Still waiting for that real-world use case... :-) -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

__setstate__, a generic __getstate__, listiter.__setstate__, (typed) arrays and memoryviews ```python import collections.abc from array import array list_ = [0, 10, 20] assert [list_[i] for i in range(len(list_))] == [0, 10, 20] iterator = iter(list_) assert [next(iterator) for n in range(2)] == [0, 10] iterator = iter(list_) assert iterator.__reduce__() == (iter, (list_,), 0) assert next(iterator) == 0 assert iterator.__reduce__() == (iter, (list_,), 1) assert next(iterator) == 10 assert iterator.__reduce__() == (iter, (list_,), 2) iterator.__setstate__(0) assert iterator.__reduce__() == (iter, (list_,), 0) assert next(iterator) == 0 assert next(iterator) == 10 assert next(iterator) == 20 assert iterator.__reduce__() == (iter, (list_,), 3) assert iterator.__reduce__() == (iter, (list_,), len(list_)) try: next(iterator) except StopIteration: pass assert iterator.__reduce__() == (iter, ([],)) iterator.__setstate__(1) try: assert next(iterator) == 10 except StopIteration: pass iterator = iter(list_) iterator.__setstate__(1) assert next(iterator) == 10 assert iterator.__reduce__() == (iter, (list_,), 2) try: [1, 2, 3].__reduce__() [1, 2, 3].__reduce_ex__(0) [1, 2, 3].__reduce_ex__(1) except TypeError as e: assert e.args[0] == "can't pickle list objects" [1, 2, 3].__reduce_ex__(2) def __getstate__(obj): if (not isinstance(obj, collections.abc.Iterable) or isinstance(obj, list)): raise TypeError('__getstate__ only works with iterables', type(obj)) reducefunc = getattr(obj, '__reduce__ex__', False) reduceoutput = reducefunc(2) if reducefunc else obj.__reduce__() if len(reduceoutput) < 3: raise StopIteration # ? return reduceoutput[2] iterator = iter(list_) assert __getstate__(iterator) == 0 next(iterator) assert __getstate__(iterator) == 1 next(iterator) assert __getstate__(iterator) == 2 next(iterator) assert __getstate__(iterator) == 3 try: next(iterator) except StopIteration: pass try: __getstate__(iterator) except StopIteration: pass iterator = iter(list_) assert __getstate__(iterator) == 0 assert next(iterator) == 0 assert __getstate__(iterator) == 1 iterator.__setstate__(0) assert __getstate__(iterator) == 0 assert next(iterator) == 0 assert __getstate__(iterator) == 1 try: __getstate__([1, 2, 3]) except TypeError as e: assert e.args[0] == "__getstate__ only works with iterables" assert e.args[1] == list, e.args[1] # list_iterator; type(iter(list())) pass # arrays must be typed; # otherwise random access isn't possible # because skipping ahead or back by n*size requires n calls to size(array[n]) list_ary = array('i', list_) iterator = iter(list_ary) assert [next(iterator) for n in range(2)] == [0, 10] ary_memoryview = memoryview(list_ary) iterator = iter(ary_memoryview) assert [next(iterator) for n in range(2)] == [0, 10] assert ary_memoryview.obj == list_ary assert ary_memoryview.tolist() == list_ assert ary_memoryview[1] == 10 ary_memoryview[1] = 100 assert ary_memoryview[1] == 100 assert list_ary[1] == 100 assert ary_memoryview[:2].tolist() == [0, 100] list_ary[1] = 1000 assert ary_memoryview[1] == 1000 assert ary_memoryview[:2].tolist() == [0, 1000] ary_memoryview.release() try: ary_memoryview[:2].tolist() except ValueError as e: assert e.args[0] == "operation forbidden on released memoryview object" list_ = [0, 10, 20] iterable = iter(list_) assert next(iterable) == 0 list_.insert(1, 5) assert next(iterable) == 5 ``` - https://docs.python.org/3/library/pickle.html#object.__setstate__ - listiter_setstate: https://github.com/python/cpython/blob/v3.10.0a1/Objects/listobject.c#L3215-... : ```c static PyObject * listiter_setstate(listiterobject *it, PyObject *state) { Py_ssize_t index = PyLong_AsSsize_t(state); if (index == -1 && PyErr_Occurred()) return NULL; if (it->it_seq != NULL) { if (index < 0) index = 0; else if (index > PyList_GET_SIZE(it->it_seq)) index = PyList_GET_SIZE(it->it_seq); /* iterator exhausted */ it->it_index = index; } Py_RETURN_NONE; } ```c On Wed, Oct 7, 2020 at 11:32 AM Guido van Rossum <guido@python.org> wrote:

"I am +0.3 on this as I don't personally have a need" " (I'm not sure why you would ever want to `__advance__` a tree iterator)"
What's your use case for advancing a count object, rather than just creating a new one?
Itertools.count was an example (hence the use of "e.g.") of an iterator which can be efficiently advanced without producing intermediate state. Clearly anyone can advance it manually. My point is that an iterator may have an efficient way to calculate its state some point in the future without needing to calculate the intermediate state. For example the fibonacci sequence has a closed form formula for the nth element and hence could be advanced efficiently. I realize my original examples were contrived, but I have a better one. Consider the task of collecting TCP headers from an iterator of bytes: ``` def get_tcp_headers(stream: Iterator[Byte]): while stream: # Move to the total length field of the IP header stream.advance(2) # record the total length (two bytes) total_length = ... # skip the rest of IP header stream.advance(28) # record the TCP header header = ... yield header stream.advance(total_length - 32 - len(header)) ``` Maybe Kevin can tell us what motivated him to post this idea but I can see many places in parsing where you might want to skip arbitrary portions of a stream. The beauty of iterators is you don't need to be concerned with the underlying data structure. Ideally I shouldn't need to write two versions of some parse function one which operates on sequences and one that operates on iterables, just so I can efficiently `advance` the sequences. -- Caleb Donovick On Tue, Oct 6, 2020 at 6:16 PM Steven D'Aprano <steve@pearwood.info> wrote:

Addendum to my example: If my get_tcp_header was made a class it would also be possible for it to support the `__advance__` protocol: ``` class TCPHeaderIter(Iterator[TCPHeader]): def __init__(self, stream: Iterator[Byte]): self.stream = stream def __next__(self) -> TCPHeader: # similar to the body of the while loop def __advance__(self, n): for _ in range(n): self.stream.advance(2) total_length = ... self.stream.advance(total_length - 4) ``` Now I don't have a use case for `__advance__` on a TCP header iterator but one might want to sample every N'th header. On Wed, Oct 7, 2020 at 3:06 PM Caleb Donovick <donovick@cs.stanford.edu> wrote:

On Wed, Oct 7, 2020 at 6:24 PM Caleb Donovick <donovick@cs.stanford.edu> wrote:
Yes, this is technically an example. But this doesn't get us any closer to a real-world use case. If you want an iterator than counts from N, the spelling `count(N)` exists now. If you want to starting counting N elements later than wherever you are now, I guess do: new_count = counter(next(old_cound) + N) For example the fibonacci sequence has a closed
form formula for the nth element and hence could be advanced efficiently.
Sure. And even more relevantly, if you want the Nth Fibonacci you can write a closed-form function `nth_fib()` to get it. This is toy examples where *theoretically* a new magic method could be used, but it's not close to a use case that would motivate changing the language. ```
This is BARELY more plausible as a real-world case. Throwing away 28 bytes with a bunch of next() calls is completely trivial in time. A case where some implementation could conceivably save measurable time would require skipping 100s of thousands or millions of next() calls... and probably calls that actually took some work to compute to matter, even there. What you'd need to motivate the new API is a case where you might skip a million items in an iterator, and yet the million-and-first item is computable without computing all the others. Ideally something where each of those million calls does something more than just copy a byte from a kernel buffer. I don't know that such a use case does not exist, but nothing comes to my mind, and no one has suggested one in this thread. Otherwise, itertools.islice() completely covers the situation already. -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

Seeing as an IP packet could in theory be as large 64K `stream.advance(total_length - 32 - len(header))` could be skipping ~65,000 next calls. (Although in practice packets are very unlikely to exceed 1,500 bytes because of the ethernet standard.) In any event avoiding ~1500 next calls per packet is hardly insignificant if you want to process more than handful of packet. Now a better rejection to my example is that this sort of code does not belong in python and should be in a systems language, an argument I would agree with. -- Caleb Donovick On Wed, Oct 7, 2020 at 3:38 PM David Mertz <mertz@gnosis.cx> wrote:

Yes on systems language. But also, why read IP packets as a bytes iterator? Reading the whole 64k into a bytes object is trivial memory, and then it's random access anyway. I was trying to construct a scenario where this could be faster. Maybe the network is VERY slow and the whole bigger isn't available at once. But then .advance() still has to block on the bytes anyway. I suppose you can invent a story where the first 32k bytes of each packet are fast on the network, them the other 32k bytes are slow. It feels like you have to fine-tune a very unusual scenario to produce even minor benefit. On Wed, Oct 7, 2020, 7:06 PM Caleb Donovick <donovick@cs.stanford.edu> wrote:

I suspect you will encounter a bit of resistence here, because itertools is about, well, itertors, and designbed to work with arbitrary iterables, and what you are looking for is somnethign optimized for Sequences. Which doesn't mean it couldn't be there.
Slicing large lists/strings/bytes/tuples is both memory and time inefficient.
Exactly. Which is why I'd like to have a "view slice", and or more general sequence view object, as outlined here: https://github.com/PythonCHB/islice-pep/blob/master/pep-xxx-islice.rst and discussed on this list here: https://mail.python.org/archives/list/python-ideas@python.org/message/LEMV3W... (from four years ago -- which I only just noticed!) And more recently: https://mail.python.org/archives/list/python-ideas@python.org/message/U35BS5... That kinda petered out, with from what I could tell, little (no) interest from any core devs. But this is another motivating use case -- so time to revive it? BTW -- see above, Serhiy Storchak, from four years ago, proposed a "sequence tools" module -- which would address the "problem" of adding things to itertools that are only useful for Sequences. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
participants (13)
-
Alex Hall
-
Cade Brown
-
Caleb Donovick
-
Chris Angelico
-
Christopher Barker
-
David Mertz
-
Guido van Rossum
-
Josh Rosenberg
-
Kevin Mills
-
Marco Sulla
-
Michael Smith
-
Steven D'Aprano
-
Wes Turner