[Python-ideas] pipe function for itertools?

Raymond Hettinger python at rcn.com
Tue May 26 21:30:01 CEST 2009


[Donald 'Paddy' McCarthy]
> Hi, I have a blog entry, http://paddy3118.blogspot.com/2009/05/pipe-fitting-with-python-generators.html,
> in which I define a helper function to get around the problem of
> reversion in the nesting of generators and the proliferation of nested
> brackets.
> 
> See the blog entry for a fuller treatment, but in essence if you have
> a series of generators, and want the data to conceptually flow like
> this:
> 
> gen1 -> gen2 -> gen3 -> gen4 ...
> 
> You have to write:
> 
> ...(gen4(gen3(gen2(gen1()))))...
> 
> With pipe, you would write:
> 
> pipe(gen1, gen2, gen3, gen4, ...)

I'm not too excited about this for several reasons.

* Many itertools do not stack linearly.  The groupby() tool emits a stream of
(key, generator) pairs that do not readily feed into other itertools (the
generator part is meant to be consumed right away).  The tee() tool emits
multiple streams meant to be consumed contemporaneously.  The zip tools
take in multiple input steams.  Some like count() and repeat() are meant
to be feed into zip alongside other streams.  IOW, piping is only a natural 
model for a limited subset of use cases.

* The existing approaches let you handle multiply nested tools by simply
assigning intermedate results to variables:

   it1, it2 = tee(iterable)
   decorated = izip(imap(func, it1), count(), it2)
   processed = sometool(decorated, somearg)
   undecorated = imap(itemgetter(2), processed)
   return undecorated

* I don't like conflating the mental models for pipes with those for
generators.  While there are similarities, there are also differences
that become less evident when a piping notation is used. Operating system 
pipes are buffered and producer processes can get suspended while
consumers catch up.  With generators, the consumer functions are 
in-charge, not the producers.

* There doesn't seem to be much of an advantage to a pipe notation
for generators.  No new capabilites are added.  It is essentially just
a new syntax for a subset of things we already do.  And, the notation 
becomes awkward and forced for use cases with multiple input streams 
and multiple output streams.  Why add yet another way to do it?


Raymond



More information about the Python-ideas mailing list