[SciPy-User] tool for running simulations
David Baddeley
david_baddeley at yahoo.com.au
Fri Jun 17 20:46:49 EDT 2011
Hi Dan,
I've cobbled together something for distributed data analysis (not simulation)
which does quite a few of the things you want to do. I didn't really do much
planning and it more or less evolved features as I needed them. This means that
a number of the architectural choices are a little bit questionable in
hindsight, and to some extent specific to my application (distributed analysis
of image data), but it might give you some ideas - let me know if you're
interested and I can flick you the code and/or tease it out as a separate
subproject & open source it (will have to see how deeply entwined it is in the
larger project which I can't open at this time). At any rate I should be able to
give you some pointers.
There are two parts, a server which:
- accepts image frames from software driving a camera & saves them, along with
metadata, in hdf5 format
- alternatively accepts generic 'Tasks' which can be pretty much anything
- allows clients/workers to request tasks to process
- collates the results and saves to a separate hdf5 file, having propagated the
metadata
and clients which:
- request a task
- optionally request additional data/metadata from the server
- process the task & submit back to the server
I'm using Pyro (http://irmen.home.xs4all.nl/pyro3/) for IPC and hdf5 for data
storage. The whole lot's platform agnostic (we run it on a mix of windows,
linux, and mac machines), and pyro makes the ipc really easy.
Using a single server & pyro means it's limited to problems where each task
takes long enough that the communication overhead isn't too high. If you want to
use hdf5, I'd suggest sticking to a single server which provides the requested
data to the clients, rather than having each client independently trying to read
the hdf files over e.g. a shared file system. I spent some time trying to work
out how best to synchronise hdf5 file access across different processes and
didn't come up with any easy solution (my original idea had been to write the
data .hdf5 from the camera software, and then just tell each of the workers
where it was - this works if you're only doing read-only access, but falls over
badly when you need to read and write).
cheers,
David
----- Original Message ----
From: Dan Goodman <dg.gmane at thesamovar.net>
To: scipy-user at scipy.org
Sent: Sat, 18 June, 2011 6:38:38 AM
Subject: [SciPy-User] tool for running simulations
Hi all,
I have an idea for a tool that might be useful for people writing and
running scientific simulations. It might be that something like this
already exists, in which case if anyone has any good suggestions please
let me know! Otherwise, I might have a go at writing something, in which
case any feature requests or useful advice would be great.
Basically, the situation I keep finding myself in, and I assume many
others do so too, is that I have some rather complicated code to set up
and run a simulation (in my case, computational neuroscience
simulations). I typically want to run each simulation many times,
possibly with different parameters, and then do some averaging or more
complicated analysis at the end. Usually these simulations take around
1h to 1 week to run, depending on what I'm doing and assuming I'm using
multiple computers/CPUs to do it. The issue is that I want to be able to
run my code on several computers at once, and have the results available
on all the computers. I've been coming up with all sorts of annoying
ways to do this, for example having each computer generate one file with
a unique name, and then merging them afterwards - but this is quite tedious.
What I imagine is a tool that does something like this:
- process the task & submit back to the server* Run a server process on each of
several computers, that controls file
access (this avoids any issues with contention). One computer is the
master and if the other ones want to read or write a file then it is
transferred to the master. Some files might want to be cached/mirrored
on each computer for faster access (typically for read only files in my
case).
* Use a nice file format like HDF5 that allows fast access, store
metadata along with your data, and for which there are good tools to
browse the data. This is important because as you change your simulation
code, you might want to weed out some old data based on the metadata,
but not have to recompute everything, etc.
* Allows you to store multiple data entries (something like tables in
HDF5 I guess) and then select out specific ones for analysis.
* Allows you to use function cacheing. For example, I often have the
situation that I have a computation that takes about 10m for each set of
parameter values that is then used in several simulations. I'd like
these to be automatically cached (maybe based on a hash of the arguments
to the function).
As far as I can tell, there are tools to do each of the things above,
but nothing to combine them all together simply. For example, there are
lots of tools for distributed filesystems, for HDF5 and for function
value cacheing, but is there something that when you call a function
with some particular values, creates a hash, checks in a distributed
version of HDF5 for that hash value and then either returns the value or
stores it in the HDF5 file with the relevant metadata (maybe the values
of the arguments and not just the hash).
Since all the tools are basically already there, I don't think this
should take too long to write (maybe just a few days), but could be
useful for lots of people because at the moment it requires mastering
quite a few different tools and writing code to glue them together. The
key thing is to choose the best tools for the job and take the right
approach, so any ideas for that? Or maybe it's already been done?
Dan
_______________________________________________
SciPy-User mailing list
SciPy-User at scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
More information about the SciPy-User
mailing list