itertools expansion time has arrived?

Raymond Hettinger vze4rx4y at verizon.net
Fri Aug 22 19:31:10 CEST 2003


[Christos TZOTZIOY Georgiou]
> Hi all, although more like "hi Raymond" :)

Hello!


> Since 2.3 is out, now, perhaps it's time to add the itertools.window,
> itertools.weave (and maybe others I did not happen to read about).   I
> don't know if Raymond needs any help, I'd be glad to.

See the patch at www.python.org/sf/756253 .
Jack developed implementations for window() and roundrobin().
Upon review, I rejected window() on the basis that the C version
offered no compelling advantage over the pure python version.
So, we included the pure python version in the docs.  Jack is still
refining itertools.roundrobin().

IIRC, weave() was a merge iterator and an implementation was
proposed along with a stream class:  www.python.org/sf/774414
The merge iterator was rejected on the basis on not being
sufficiently general purpose -- meaning that it doesn't seem to
be a basic building block for iterator algebra.  Also, that specific
implementation did not allow for a custom sort order.

The stream class is the start of a good idea.  Over time, I expect
that it will evolve into a general purpose lazy list object that is
directly substitutable for real lists but has an underlying iterator
that supplies data on a just in time basis.  Much experimentation
is needed because there are thorny issues:

* is it safe to implement a len() method knowing that users may
   have potentially infinite underlying iterators?

* likewise, how would you handle:   s[:-3]

* does the implementation respond correctly to a series of
   appends, inserts, and deletes?  to do that, it must be able
   to keep a running tab of all of the changes or will have
   to run the underlying iterator and throw away the advantages
   of just-in-time data production.

* will memory utilization be surprising when the stream object
   usually consumes no memory and then a seemingly innocuous
   query causes much of the data to be loaded into an internal list?

* does it new class play nicely with the other itertools?  more
   specifically, is it easy to compose new functions by combining
   the new class with the existing tools?  my early experiments
   indicate that they don't combine well.

* the promise (holy grail?) of the new class is to provide a basis
   for creating new generators using Haskell like self-reference
   to the stream being created.   Unfortunately, the ideas presented
   to date are not sufficiently powerful to express something like this:

            fib  = 1 : 1 : [ a+b | (a,b) <- zip fib (tail fib) ]

   Such a construct would have been especially nice for expressing
   the proposed window() function.

One other itertools proposal was a function that consumes the
underlying iterator and returns nothing.  Functional folks will
hate this because it is all about programming with side-effects:

   itertools.consume(mydict.__setitem__, keygen(), valuegen()) -->  None

Speed freaks will like it because it is a for-loop that doesn't need
to loop through the byte code evaluation cycle on each pass.

I'm keeping an open mind on all these ideas.  From the feedback
so far, users are excited about and like the toolset.  However,
combining the tools requires some mental effort and that effort
gets much has when even a single additional tool is added.  So, I'm
trying to keep it small.

Also, python's generators have excellent performance and make it
trivially easy to make your own custom new itertools if the need
arises.  As a result, I feel very little pressure to add new tools.

Whenever someone proposes a new, general purpose combination
of the tools, I have added it to the examples in the docs.   Reading
those examples will rapidly increase one's sophistication in using
the toolset.


Raymond Hettinger







More information about the Python-list mailing list