[SciPy-User] tool for running simulations

Fri Jun 17 15:47:07 EDT 2011

I did some parameter studies for my thesis (finite element analyses,
heat transfer) and something like this would've definitely been
useful. Of course, I also had some other problems. My simulations ran
in MATLAB/COMSOL in tandem and not python, and due to who-knows-what I
had a lot of segfaults. As such, for this tool to have been useful for
*that* particular project I would've needed to mash it together with a
lightweight userspace process monitor of some sort and then start such
an external process with some parameters passed in.

There may be some/many similarities between what you're talking about
doing, and mapreduce frameworks such as hadoop
(http://hadoop.apache.org/) or disco (http://discoproject.org/). In
fact, you may find that one of these does basically what you want. If
so, I'd love to hear how it goes! I always kinda meant to get my hands
dirty with one of these but never did.

Good luck,

--Josh

On Fri, Jun 17, 2011 at 11:38 AM, Dan Goodman <dg.gmane at thesamovar.net> wrote:
> Hi all,
>
> I have an idea for a tool that might be useful for people writing and
> running scientific simulations. It might be that something like this
> already exists, in which case if anyone has any good suggestions please
> let me know! Otherwise, I might have a go at writing something, in which
> case any feature requests or useful advice would be great.
>
> Basically, the situation I keep finding myself in, and I assume many
> others do so too, is that I have some rather complicated code to set up
> and run a simulation (in my case, computational neuroscience
> simulations). I typically want to run each simulation many times,
> possibly with different parameters, and then do some averaging or more
> complicated analysis at the end. Usually these simulations take around
> 1h to 1 week to run, depending on what I'm doing and assuming I'm using
> multiple computers/CPUs to do it. The issue is that I want to be able to
> run my code on several computers at once, and have the results available
> on all the computers. I've been coming up with all sorts of annoying
> ways to do this, for example having each computer generate one file with
> a unique name, and then merging them afterwards - but this is quite tedious.
>
> What I imagine is a tool that does something like this:
>
> * Run a server process on each of several computers, that controls file
> access (this avoids any issues with contention). One computer is the
> master and if the other ones want to read or write a file then it is
> transferred to the master. Some files might want to be cached/mirrored
> on each computer for faster access (typically for read only files in my
> case).
>
> * Use a nice file format like HDF5 that allows fast access, store
> metadata along with your data, and for which there are good tools to
> browse the data. This is important because as you change your simulation
> code, you might want to weed out some old data based on the metadata,
> but not have to recompute everything, etc.
>
> * Allows you to store multiple data entries (something like tables in
> HDF5 I guess) and then select out specific ones for analysis.
>
> * Allows you to use function cacheing. For example, I often have the
> situation that I have a computation that takes about 10m for each set of
> parameter values that is then used in several simulations. I'd like
> these to be automatically cached (maybe based on a hash of the arguments
> to the function).
>
> As far as I can tell, there are tools to do each of the things above,
> but nothing to combine them all together simply. For example, there are
> lots of tools for distributed filesystems, for HDF5 and for function
> value cacheing, but is there something that when you call a function
> with some particular values, creates a hash, checks in a distributed
> version of HDF5 for that hash value and then either returns the value or
> stores it in the HDF5 file with the relevant metadata (maybe the values
> of the arguments and not just the hash).
>
> Since all the tools are basically already there, I don't think this
> should take too long to write (maybe just a few days), but could be
> useful for lots of people because at the moment it requires mastering
> quite a few different tools and writing code to glue them together. The
> key thing is to choose the best tools for the job and take the right
> approach, so any ideas for that? Or maybe it's already been done?
>
> Dan
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>