[Python-ideas] Batching/grouping function for itertools

Steven D'Aprano steve at pearwood.info
Sun Dec 8 17:45:19 CET 2013


On Mon, Dec 09, 2013 at 12:34:28AM +0900, Stephen J. Turnbull wrote:
> Chris Angelico writes:
> 
>  > How are you going to take the next n items from dice_roller without
>  > advancing it?
> 
> Memoize.

Er, I don't think so. How does the memoizing cache get those values if 
the underlying iterator isn't advanced? Obviously it can't. 
itertools.tee uses a cache, so we can demonstrate the issue:

py> it = iter("abcde")
py> wrapper = itertools.tee(it, 2)[0]
py> _ = list(wrapper)

If the iterator hasn't advanced, then next(it) should yield 'a'. But:

py> next(it)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration


Any sort of "iterator look-ahead" has a number of fundamental problems. 
Despite many requests, those problems are part of the reason why Python 
iterators don't provide a "peek" method to look ahead. Not even to look 
ahead a single value, let alone an arbitrary number of values.

- The cache would require unbounded memory (unless you limit 
  the look-ahead to N values);

- iterators with side-effects would cause those side-effects 
  at the wrong time;

- iterators whose calculated values are time-dependent could 
  be calculated at a different time from when they are returned,
  potentially giving the wrong result.


For something like tee, it is difficult to see any other way other than 
memoisation to get the functionality needed, so we just have to live 
with the limitations. But offering dedicated look-ahead with caching as 
fundamental iterator tools, as Ron suggests, strikes me as completely 
the wrong thing to do if what we actually want is to group items. It 
doesn't solve the problem being asked, since it's still up to the caller 
to make their own grouper tool out of the memoising primitive.


-- 
Steven


More information about the Python-ideas mailing list