[Python-ideas] Batching/grouping function for itertools
Steven D'Aprano
steve at pearwood.info
Sun Dec 8 17:45:19 CET 2013
On Mon, Dec 09, 2013 at 12:34:28AM +0900, Stephen J. Turnbull wrote:
> Chris Angelico writes:
>
> > How are you going to take the next n items from dice_roller without
> > advancing it?
>
> Memoize.
Er, I don't think so. How does the memoizing cache get those values if
the underlying iterator isn't advanced? Obviously it can't.
itertools.tee uses a cache, so we can demonstrate the issue:
py> it = iter("abcde")
py> wrapper = itertools.tee(it, 2)[0]
py> _ = list(wrapper)
If the iterator hasn't advanced, then next(it) should yield 'a'. But:
py> next(it)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Any sort of "iterator look-ahead" has a number of fundamental problems.
Despite many requests, those problems are part of the reason why Python
iterators don't provide a "peek" method to look ahead. Not even to look
ahead a single value, let alone an arbitrary number of values.
- The cache would require unbounded memory (unless you limit
the look-ahead to N values);
- iterators with side-effects would cause those side-effects
at the wrong time;
- iterators whose calculated values are time-dependent could
be calculated at a different time from when they are returned,
potentially giving the wrong result.
For something like tee, it is difficult to see any other way other than
memoisation to get the functionality needed, so we just have to live
with the limitations. But offering dedicated look-ahead with caching as
fundamental iterator tools, as Ron suggests, strikes me as completely
the wrong thing to do if what we actually want is to group items. It
doesn't solve the problem being asked, since it's still up to the caller
to make their own grouper tool out of the memoising primitive.
--
Steven
More information about the Python-ideas
mailing list