Pickle based workflow - looking for advice
__peter__ at web.de
Mon Apr 13 19:08:50 CEST 2015
> I am writing a quite extensive piece of scientific software. Its
> workflow is quite easy to explain. The tool realizes series of
> operations on watersheds (such as mapping data on it, geostatistics and
> more). There are thousands of independent watersheds of different size,
> and the size determines the computing time spent on each of them.
> Say I have the operations A, B, C and D. B and C are completely
> independent but they need A to be run first, D needs B and C, and so
> forth. Eventually the whole operations A, B, C and D will run once for
> all, but of course the whole development is an iterative process and I
> rerun all operations many times.
> 4. Other comments you might have?
How about a file-based workflow?
Write distinct scripts, e. g.
a2b.py that reads from *.a and writes to *.b
and so on. Then use a plain old makefile to define the dependencies.
Whether .a uses pickle, .b uses json, and .z uses csv is but an
implementation detail that only its producers and consumers need to know.
Testing an arbitrary step is as easy as invoking the respective script with
some prefabricated input and checking the resulting output file(s).
More information about the Python-list