pipe function for itertools?

Hi, I have a blog entry, http://paddy3118.blogspot.com/2009/05/pipe-fitting-with-python-generators.ht..., in which I define a helper function to get around the problem of reversion in the nesting of generators and the proliferation of nested brackets. See the blog entry for a fuller treatment, but in essence if you have a series of generators, and want the data to conceptually flow like this: gen1 -> gen2 -> gen3 -> gen4 ... You have to write: ...(gen4(gen3(gen2(gen1()))))... With pipe, you would write: pipe(gen1, gen2, gen3, gen4, ...) Which you could use like this: for data in pipe(...): do_something_with_data() If I use dots for indentation, then maybe the definition will come through to the group: def pipe(*cmds): ....gen = cmds[0] ....for cmd in cmds[1:]: ....... gen = cmd(gen) ....for x in gen: ....... yield x A couple of readers thought that it might be a good tool for itertools to have. There are other solutions out there that use classes to overload '|', or, but it seems more natural to me to use pipe(), even though I am a heavy Unix user. Please discuss.

On Tue, 26 May 2009 08:32:58 pm Donald 'Paddy' McCarthy wrote:
The function signature is misleading. It doesn't take a series of generator functions ("cmds"), it takes an initial iterable followed by a series of generator functions. It seems to me that a cleaner definition would be: def pipe(iterable, *generators): for gen in generators: iterable = gen(iterable) for x in iterable: yield x This does seem to be a special case of function composition. If functools grew a compose() function, you could write: from functools import compose def pipe(it, *gens): for x in compose(*gens)(it): yield x Here's an untested definition for compose: def compose(f, *funcs, **kwargs): if not funcs: raise TypeError('compose() requires at least two functions') if kwargs.keys() not in ([], ['doc']): # I wish Python 2.5 had keyword only args... raise TypeError('bad keyword argument(s)') def function_composition(*args, **kwargs): value = f(*args, **kwargs) for g in funcs: value = g(value) return value function_composition.__doc__ = kwargs.get('doc') return function_composition which is more complicated than your version, of course, but also more general.
A couple of readers thought that it might be a good tool for itertools to have.
Do you have any other use-cases? -- Steven D'Aprano

[Donald 'Paddy' McCarthy]
I'm not too excited about this for several reasons. * Many itertools do not stack linearly. The groupby() tool emits a stream of (key, generator) pairs that do not readily feed into other itertools (the generator part is meant to be consumed right away). The tee() tool emits multiple streams meant to be consumed contemporaneously. The zip tools take in multiple input steams. Some like count() and repeat() are meant to be feed into zip alongside other streams. IOW, piping is only a natural model for a limited subset of use cases. * The existing approaches let you handle multiply nested tools by simply assigning intermedate results to variables: it1, it2 = tee(iterable) decorated = izip(imap(func, it1), count(), it2) processed = sometool(decorated, somearg) undecorated = imap(itemgetter(2), processed) return undecorated * I don't like conflating the mental models for pipes with those for generators. While there are similarities, there are also differences that become less evident when a piping notation is used. Operating system pipes are buffered and producer processes can get suspended while consumers catch up. With generators, the consumer functions are in-charge, not the producers. * There doesn't seem to be much of an advantage to a pipe notation for generators. No new capabilites are added. It is essentially just a new syntax for a subset of things we already do. And, the notation becomes awkward and forced for use cases with multiple input streams and multiple output streams. Why add yet another way to do it? Raymond

I was one of the commenters who said you should put the idea here. I end up using generators in a pipe-like fashion quite a bit, so a pipe [iter, func]tool would be useful. I also liked the head function listed on the page, because when using the interactive interpreter I often want to see just a part of a larger output without having to go to all the trouble of whipping up something using islice or izip + irange. I think we could also use a tail function, and although obviously it wouldn't work with itertools.count it is useful when you're filtering a large file, and you just want to see the last part of the output to tell if your filters are basically doing the right thing. Of course, all of these functions are somewhat trivial, but it could save a little work for a lot of people if they were in the stdlib. Also, they could be made to work like the proposed "yield from" and pass through .send and .raise commands, etc., which actually is a fair amount of work to implement. -- Carl Johnson

On Tue, 26 May 2009 08:32:58 pm Donald 'Paddy' McCarthy wrote:
The function signature is misleading. It doesn't take a series of generator functions ("cmds"), it takes an initial iterable followed by a series of generator functions. It seems to me that a cleaner definition would be: def pipe(iterable, *generators): for gen in generators: iterable = gen(iterable) for x in iterable: yield x This does seem to be a special case of function composition. If functools grew a compose() function, you could write: from functools import compose def pipe(it, *gens): for x in compose(*gens)(it): yield x Here's an untested definition for compose: def compose(f, *funcs, **kwargs): if not funcs: raise TypeError('compose() requires at least two functions') if kwargs.keys() not in ([], ['doc']): # I wish Python 2.5 had keyword only args... raise TypeError('bad keyword argument(s)') def function_composition(*args, **kwargs): value = f(*args, **kwargs) for g in funcs: value = g(value) return value function_composition.__doc__ = kwargs.get('doc') return function_composition which is more complicated than your version, of course, but also more general.
A couple of readers thought that it might be a good tool for itertools to have.
Do you have any other use-cases? -- Steven D'Aprano

[Donald 'Paddy' McCarthy]
I'm not too excited about this for several reasons. * Many itertools do not stack linearly. The groupby() tool emits a stream of (key, generator) pairs that do not readily feed into other itertools (the generator part is meant to be consumed right away). The tee() tool emits multiple streams meant to be consumed contemporaneously. The zip tools take in multiple input steams. Some like count() and repeat() are meant to be feed into zip alongside other streams. IOW, piping is only a natural model for a limited subset of use cases. * The existing approaches let you handle multiply nested tools by simply assigning intermedate results to variables: it1, it2 = tee(iterable) decorated = izip(imap(func, it1), count(), it2) processed = sometool(decorated, somearg) undecorated = imap(itemgetter(2), processed) return undecorated * I don't like conflating the mental models for pipes with those for generators. While there are similarities, there are also differences that become less evident when a piping notation is used. Operating system pipes are buffered and producer processes can get suspended while consumers catch up. With generators, the consumer functions are in-charge, not the producers. * There doesn't seem to be much of an advantage to a pipe notation for generators. No new capabilites are added. It is essentially just a new syntax for a subset of things we already do. And, the notation becomes awkward and forced for use cases with multiple input streams and multiple output streams. Why add yet another way to do it? Raymond

I was one of the commenters who said you should put the idea here. I end up using generators in a pipe-like fashion quite a bit, so a pipe [iter, func]tool would be useful. I also liked the head function listed on the page, because when using the interactive interpreter I often want to see just a part of a larger output without having to go to all the trouble of whipping up something using islice or izip + irange. I think we could also use a tail function, and although obviously it wouldn't work with itertools.count it is useful when you're filtering a large file, and you just want to see the last part of the output to tell if your filters are basically doing the right thing. Of course, all of these functions are somewhat trivial, but it could save a little work for a lot of people if they were in the stdlib. Also, they could be made to work like the proposed "yield from" and pass through .send and .raise commands, etc., which actually is a fair amount of work to implement. -- Carl Johnson
participants (5)
-
Bruce Frederiksen
-
Carl Johnson
-
Donald 'Paddy' McCarthy
-
Raymond Hettinger
-
Steven D'Aprano