Dataflow programming in Python

Anh Hai Trinh anh.hai.trinh at
Fri Sep 11 21:09:37 CEST 2009

Hello all,

I just want to share with you something that I've worked on recently.
It is a library which implements streams -- generalized iterators with
a pipelining mechanism and lazy-evaluation to enable data-flow
programming in Python.

The idea is to be able to take the output of a function that turn an
iterable into another iterable and plug that as the input of another
such function. While you can already do some of this using function
composition, this package provides an elegant notation for it by
overloading the '>>' operator.

To give a simple example of string processing, here we grep the lines
matching some regex, strip them and accumulate to a list:

> import re
> result = open('log').xreadlines() >> filter(re.compile('[Pp]attern').search) >> mapmethod('strip') >> list

This approach focuses the programming on processing streams of data,
step by step. A pipeline usually starts with a generator, or anything
iterable, then passes through a number of processors. Multiple streams
can be branched and combined. Finally, the output is fed to an
accumulator, which can be any function of one iterable argument.

Another advantage is that the values are lazily computed, i.e. only
when the accumulator needs to have it.


More information about the Python-list mailing list