[Tutor] any best practice on how to glue tiny tools together

Kent Johnson kent37 at tds.net
Fri Feb 6 12:44:11 CET 2009


On Fri, Feb 6, 2009 at 4:11 AM, Daniel <daniel.chaowang at gmail.com> wrote:
> Hi Tutors,
>
> I want to use python to finish some routine data processing tasks
> automatically (on Windows).
>
> The main task could be split to sub small tasks. Each can be done by
> executing some small tools like "awk" or by some other python scripts.
> One example of such task is conducting a data processing job, including:
>
> use tool-A to produce some patterns.
> feed tool-B with these patterns to mine more related data
> repeat these tasks circularly until meeting some conditions.
>
> The real task includes more tools which run in parallel or sequential
> manner.
>
> I know how to do this with modules like subprocess, but the final python
> program looks somewhat messy and hard to adapt for changes.
>
> Do you have any best practices on this?

My first thought was, use shell pipelines and bash. Then I remembered,
David Beazley shows how to use generators to implement a processing
pipeline in Python:
http://www.dabeaz.com/generators-uk/

It's a fascinating read, it might take a couple of times to get it but
it might fit your needs quite well. You would write a generator that
wraps a subprocess call and use that to access external tools; other
pieces and the control logic would be in Python.

Kent


More information about the Tutor mailing list