[SciPy-User] tool for running simulations

Sun Jun 19 16:40:29 EDT 2011

Thanks everyone for ideas and suggestions. Jug seems to come closest to 
what I was thinking about. The only concern I have with that is the 
redis database, which I'm not sure I like the sound of, but maybe not 
for particularly good reasons, and quite probably it's something I could 
work with.

I was aware of Sumatra, and know Andrew Davison (in fact my main project 
is hosted by his Neural Ensemble group). It seems like a great project, 
but still alpha and maybe more focused on reproducibility than in 
managing relatively large amounts of data. Also, the function cacheing 
part is quite important for what I have in mind for it, and of the 
suggestions people sent me I think only Jug has this included as standard.

David - I'd certainly be interested in seeing your code if it's easy to 
put something together. Send me an email.

Dan

On 17/06/2011 20:38, Dan Goodman wrote:
> Hi all,
>
> I have an idea for a tool that might be useful for people writing and
> running scientific simulations. It might be that something like this
> already exists, in which case if anyone has any good suggestions please
> let me know! Otherwise, I might have a go at writing something, in which
> case any feature requests or useful advice would be great.
>
> Basically, the situation I keep finding myself in, and I assume many
> others do so too, is that I have some rather complicated code to set up
> and run a simulation (in my case, computational neuroscience
> simulations). I typically want to run each simulation many times,
> possibly with different parameters, and then do some averaging or more
> complicated analysis at the end. Usually these simulations take around
> 1h to 1 week to run, depending on what I'm doing and assuming I'm using
> multiple computers/CPUs to do it. The issue is that I want to be able to
> run my code on several computers at once, and have the results available
> on all the computers. I've been coming up with all sorts of annoying
> ways to do this, for example having each computer generate one file with
> a unique name, and then merging them afterwards - but this is quite tedious.
>
> What I imagine is a tool that does something like this:
>
> * Run a server process on each of several computers, that controls file
> access (this avoids any issues with contention). One computer is the
> master and if the other ones want to read or write a file then it is
> transferred to the master. Some files might want to be cached/mirrored
> on each computer for faster access (typically for read only files in my
> case).
>
> * Use a nice file format like HDF5 that allows fast access, store
> metadata along with your data, and for which there are good tools to
> browse the data. This is important because as you change your simulation
> code, you might want to weed out some old data based on the metadata,
> but not have to recompute everything, etc.
>
> * Allows you to store multiple data entries (something like tables in
> HDF5 I guess) and then select out specific ones for analysis.
>
> * Allows you to use function cacheing. For example, I often have the
> situation that I have a computation that takes about 10m for each set of
> parameter values that is then used in several simulations. I'd like
> these to be automatically cached (maybe based on a hash of the arguments
> to the function).
>
> As far as I can tell, there are tools to do each of the things above,
> but nothing to combine them all together simply. For example, there are
> lots of tools for distributed filesystems, for HDF5 and for function
> value cacheing, but is there something that when you call a function
> with some particular values, creates a hash, checks in a distributed
> version of HDF5 for that hash value and then either returns the value or
> stores it in the HDF5 file with the relevant metadata (maybe the values
> of the arguments and not just the hash).
>
> Since all the tools are basically already there, I don't think this
> should take too long to write (maybe just a few days), but could be
> useful for lots of people because at the moment it requires mastering
> quite a few different tools and writing code to glue them together. The
> key thing is to choose the best tools for the job and take the right
> approach, so any ideas for that? Or maybe it's already been done?
>
> Dan