[SciPy-User] tool for running simulations

Fri Jun 17 20:46:49 EDT 2011

Hi Dan,

I've cobbled together something for distributed data analysis (not simulation) 
which does quite a few of the things you want to do. I didn't really do much 
planning and it more or less evolved features as I needed them. This means that 
a number of the architectural choices are a little bit questionable in 
hindsight, and to some extent specific to my application (distributed analysis 
of image data), but it might give you some ideas - let me know if you're 
interested and I can flick you the code and/or tease it out as a separate 
subproject & open source it (will have to see how deeply entwined it is in the 
larger project which I can't open at this time). At any rate I should be able to 
give you some pointers.

There are two parts, a server which:
- accepts image frames from software driving a camera & saves them, along with 
metadata, in hdf5 format
- alternatively accepts generic 'Tasks' which can be pretty much anything
- allows clients/workers to request tasks to process
- collates the results and saves to a separate hdf5 file, having propagated the 
metadata

and clients which:
- request a task
- optionally request additional data/metadata from the server
- process the task & submit back to the server

I'm using Pyro (http://irmen.home.xs4all.nl/pyro3/) for IPC and hdf5 for data 
storage. The whole lot's platform agnostic (we run it on a mix of windows, 
linux, and mac machines), and pyro makes the ipc really easy.

Using a single server & pyro means it's limited to problems where each task 
takes long enough that the communication overhead isn't too high. If you want to 
use hdf5, I'd suggest sticking to a single server which provides the requested 
data to the clients, rather than having each client independently trying to read 
the hdf files over e.g. a shared file system. I spent some time trying to work 
out how best to synchronise hdf5 file access across different processes and 
didn't come up with any easy solution (my original idea had been to write the 
data .hdf5 from the camera software, and then just tell each of the workers 
where it was - this works if you're only doing read-only access, but falls over 
badly when you need to read and write).

cheers,
David

----- Original Message ----
From: Dan Goodman <dg.gmane at thesamovar.net>
To: scipy-user at scipy.org
Sent: Sat, 18 June, 2011 6:38:38 AM
Subject: [SciPy-User] tool for running simulations

Hi all,

I have an idea for a tool that might be useful for people writing and 
running scientific simulations. It might be that something like this 
already exists, in which case if anyone has any good suggestions please 
let me know! Otherwise, I might have a go at writing something, in which 
case any feature requests or useful advice would be great.

Basically, the situation I keep finding myself in, and I assume many 
others do so too, is that I have some rather complicated code to set up 
and run a simulation (in my case, computational neuroscience 
simulations). I typically want to run each simulation many times, 
possibly with different parameters, and then do some averaging or more 
complicated analysis at the end. Usually these simulations take around 
1h to 1 week to run, depending on what I'm doing and assuming I'm using 
multiple computers/CPUs to do it. The issue is that I want to be able to 
run my code on several computers at once, and have the results available 
on all the computers. I've been coming up with all sorts of annoying 
ways to do this, for example having each computer generate one file with 
a unique name, and then merging them afterwards - but this is quite tedious.

What I imagine is a tool that does something like this:

- process the task & submit back to the server* Run a server process on each of 
several computers, that controls file 

access (this avoids any issues with contention). One computer is the 
master and if the other ones want to read or write a file then it is 
transferred to the master. Some files might want to be cached/mirrored 
on each computer for faster access (typically for read only files in my 
case).

* Use a nice file format like HDF5 that allows fast access, store 
metadata along with your data, and for which there are good tools to 
browse the data. This is important because as you change your simulation 
code, you might want to weed out some old data based on the metadata, 
but not have to recompute everything, etc.

* Allows you to store multiple data entries (something like tables in 
HDF5 I guess) and then select out specific ones for analysis.

* Allows you to use function cacheing. For example, I often have the 
situation that I have a computation that takes about 10m for each set of 
parameter values that is then used in several simulations. I'd like 
these to be automatically cached (maybe based on a hash of the arguments 
to the function).

As far as I can tell, there are tools to do each of the things above, 
but nothing to combine them all together simply. For example, there are 
lots of tools for distributed filesystems, for HDF5 and for function 
value cacheing, but is there something that when you call a function 
with some particular values, creates a hash, checks in a distributed 
version of HDF5 for that hash value and then either returns the value or 
stores it in the HDF5 file with the relevant metadata (maybe the values 
of the arguments and not just the hash).

Since all the tools are basically already there, I don't think this 
should take too long to write (maybe just a few days), but could be 
useful for lots of people because at the moment it requires mastering 
quite a few different tools and writing code to glue them together. The 
key thing is to choose the best tools for the job and take the right 
approach, so any ideas for that? Or maybe it's already been done?

Dan

_______________________________________________
SciPy-User mailing list
SciPy-User at scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user