Itertools wishlists

Raymond Hettinger vze4rx4y at verizon.net
Sun Mar 13 07:52:40 CET 2005


[Ville Vainio]
> >For quick-and-dirty stuff, it's often convenient to flatten a sequence
> >(which perl does, surprise surprise, by default):
 . . .
> >but something like this would be handy in itertools as well.
> >
> >It seems trivial, but I managed to screw up several times when trying
> >to produce my own implementation (infinite recursion).

[Christos TZOTZIOY Georgiou]
> See Python Library Reference, "5.16.3 Recipes".  Now that all and any (also
> presented as recipes there) are considered to be included, perhaps flatten
gets
> a chance too.
>
> This is just a personal opinion, but I detest restraints on library (itertools
> module in this case) expansion when talking about such useful *building
blocks*.
> What happened to "batteries included"?

FWIW, requests for additions to the itertools module have not fallen on deaf
ears.  There are no arbitrary restraints on building out this module.  Each
request has gotten careful thought and a couple of them were accepted in Py2.4
(itertools.tee and itertools.groupby).  If you would like to know the reasoning
behind any particular acceptance, rejection, or deferral, then just ask.

itertools.window() with n=2 got rejected.  Almost all proposed uses had better
solutions (such as an accumulator variable or fibonacci sequence style logic:
a, b = b, a+b).  Writing it in C afforded only small speed advantage over a
solution using izip() and tee().

itertools.window() with n>2 was also rejected.  While use cases arise in markov
chains, statistics (moving averages, etc), and cryptanalysis (trigraph
analysis), there were almost always better solutions.  window() spent most of
its time creating new tuples and shifting each of the common elements by one
position.  A solution using collections.deque is generally superior because the
popleft() and append() operations do not entail moving all the common elements.
It was instructive to examine a use case with a large n, sliding window
compression -- the Right Solution (tm) does *not* entail continuously shifting
all of the data elements.  IOW, providing a window() function in itertools is a
mistake because it leads people away from better solutions.

The jury is still out on flatten().  The principal implementation problem is not
recursing into iterable objects that the caller wants to treat as atomic.  While
that can be done, there doesn't seem to be a clear winner among the solutions
that have arisen.  Also, the solutions to that problem make the resulting
function more difficult to learn, remember, review, etc.   The nature of
flattening is such that a C implementation doesn't offer any special advantage
over the various competing pure python versions.  And, there is also the issue
of use cases.  It appears to be much more fun to toy around with developing
flatten() recipes than it is to work on applications that require it.  That is
not to say that it doesn't come-up or that it isn't helpful in Mathematica;
however, it is somewhat rare and not always the right solution even when it
could be used.

itertools.queue() was rejected because it didn't fit naturally into
applications -- you had to completely twist your logic around just to
accommodate it.  Besides, it is already simple to iterate over a list while
appending items to it as needed.

itertools.roundrobin() was too specialized (being the engine behind a
multi-input task server) and that specialty often demanded more complex
capabilities than offered by roundrobin.  For many situations, collections.deque
offers a better solution.

itertools.accumulate() returned the successive steps of a reduce() operation.
It too had precedents in APL and Mathematica.  However, it loses its appeal with
functions more complex than operator.add or operator.mul.  The effort to write a
more complex function is better spent writing a simple generator.
Alternatively, many potential applications were better served by in-lining the
function.  The other issue was usability.  accumulate() suffered from the same
complexities as reduce() -- it took too much thought to read, write, and review
(quick which variable comes first, the cumulative value or the new data element;
does it fold left or fold right; is there a default initial value; yada yada).

itertools.multigen() transforms single shot generators into something
re-iterable.  The offered use cases were not compelling.  The potential for
misuse is high and the logic behind the idea doesn't appear to be
self-consistent.

itertools.remove_value() is like a lisp/scheme multirember.  Yawn.  Just use a
genexp:  (elem for elem in iterable  if elem != value).

The jury is still out on itertools.eq() which compares any two iterables for
equality.  Sometimes you want to compare [1,2,3] to (1,2,3) and consider only
the contents of the container rather than the type of the container.  It is
somewhat handy and obvious; however, someone is bound to misuse it and apply it
to arbitrarily ordered containers such as sets and dictionaries; apply it to
infinite iterators; or just inadvertently exhaust an iterator prior to actually
needing its contents.

itertools.consume() would run work like map() with no arguments and no return
value (equivalent to  list(it) where your throw-away the result list).  This is
more clearly coded in pure python and it runs only slightly faster.  Also, it is
only useful with functions that have side-effects and that is at odds with the
ideas of functional programming where itertools have their roots.

itertools.ilines() would iterate over a buffer and yield upon hitting a
universal newline -- essentially this is an in-memory version of what the file
iterator does with text files.  This kind of operation is more appropriately
added to StringIO.

In addition to the above, people routinely request that all sorts of random
ideas be put in itertools.  "That shouldn't be a builtin; stick it in
itertools."  We likely need some other module for reduction functions like
any(), all(), no(), quantify(), take(), etc.  In general, the itertools module
would be ill served by becoming a dumping ground.

Another thought is that it takes some time and skill to learn to use the tools
and how to combine them.  The time and skill seems to rise exponentially with
the number of tools in the module.  So, it would be a mistake to add a few
obscure, rarely used tools because that would impact the usability of the
existing toolset.

'nuff said, mister whatever happened to batteries included ;-)



Raymond Hettinger


P.S.  It's not an accident that the recipes in the itertools docs are in a form
that is directly cut and pastable into a working application.





More information about the Python-list mailing list