[SciPy-User] tool for running simulations

Fri Jun 17 14:38:38 EDT 2011

Hi all,

I have an idea for a tool that might be useful for people writing and 
running scientific simulations. It might be that something like this 
already exists, in which case if anyone has any good suggestions please 
let me know! Otherwise, I might have a go at writing something, in which 
case any feature requests or useful advice would be great.

Basically, the situation I keep finding myself in, and I assume many 
others do so too, is that I have some rather complicated code to set up 
and run a simulation (in my case, computational neuroscience 
simulations). I typically want to run each simulation many times, 
possibly with different parameters, and then do some averaging or more 
complicated analysis at the end. Usually these simulations take around 
1h to 1 week to run, depending on what I'm doing and assuming I'm using 
multiple computers/CPUs to do it. The issue is that I want to be able to 
run my code on several computers at once, and have the results available 
on all the computers. I've been coming up with all sorts of annoying 
ways to do this, for example having each computer generate one file with 
a unique name, and then merging them afterwards - but this is quite tedious.

What I imagine is a tool that does something like this:

* Run a server process on each of several computers, that controls file 
access (this avoids any issues with contention). One computer is the 
master and if the other ones want to read or write a file then it is 
transferred to the master. Some files might want to be cached/mirrored 
on each computer for faster access (typically for read only files in my 
case).

* Use a nice file format like HDF5 that allows fast access, store 
metadata along with your data, and for which there are good tools to 
browse the data. This is important because as you change your simulation 
code, you might want to weed out some old data based on the metadata, 
but not have to recompute everything, etc.

* Allows you to store multiple data entries (something like tables in 
HDF5 I guess) and then select out specific ones for analysis.

* Allows you to use function cacheing. For example, I often have the 
situation that I have a computation that takes about 10m for each set of 
parameter values that is then used in several simulations. I'd like 
these to be automatically cached (maybe based on a hash of the arguments 
to the function).

As far as I can tell, there are tools to do each of the things above, 
but nothing to combine them all together simply. For example, there are 
lots of tools for distributed filesystems, for HDF5 and for function 
value cacheing, but is there something that when you call a function 
with some particular values, creates a hash, checks in a distributed 
version of HDF5 for that hash value and then either returns the value or 
stores it in the HDF5 file with the relevant metadata (maybe the values 
of the arguments and not just the hash).

Since all the tools are basically already there, I don't think this 
should take too long to write (maybe just a few days), but could be 
useful for lots of people because at the moment it requires mastering 
quite a few different tools and writing code to glue them together. The 
key thing is to choose the best tools for the job and take the right 
approach, so any ideas for that? Or maybe it's already been done?

Dan