Dataflow programming in Python

Lie Ryan lie.1296 at gmail.com
Sat Sep 12 08:10:26 CEST 2009


Anh Hai Trinh wrote:
> Hello all,
> 
> I just want to share with you something that I've worked on recently.
> It is a library which implements streams -- generalized iterators with
> a pipelining mechanism and lazy-evaluation to enable data-flow
> programming in Python.
> 
> The idea is to be able to take the output of a function that turn an
> iterable into another iterable and plug that as the input of another
> such function. While you can already do some of this using function
> composition, this package provides an elegant notation for it by
> overloading the '>>' operator.
> 
> To give a simple example of string processing, here we grep the lines
> matching some regex, strip them and accumulate to a list:
> 
>> import re
>> result = open('log').xreadlines() >> filter(re.compile('[Pp]attern').search) >> mapmethod('strip') >> list
> 
> This approach focuses the programming on processing streams of data,
> step by step. A pipeline usually starts with a generator, or anything
> iterable, then passes through a number of processors. Multiple streams
> can be branched and combined. Finally, the output is fed to an
> accumulator, which can be any function of one iterable argument.
> 
> Another advantage is that the values are lazily computed, i.e. only
> when the accumulator needs to have it. 

Does it have any advantage to generator comprehension?

import re
mm = mapmethod('strip')        # Is mapmethod something in the stdlib?
pat = re.compile('[Pp]attern')
result = (mm(line) for line in open('log') if pat.search(line))

which is also lazy




More information about the Python-list mailing list