[SciPy-User] Fitting procedure to take advantage of cluster

Wed Jun 29 18:46:36 EDT 2011

Hi,

On 29 June 2011 17:54, J. David Lee <johnl at cs.wisc.edu> wrote:

> I'm attempting to perform a fit of a model function's output to some
> measured data. The model has around 12 parameters, and takes tens of
> minutes to run. I have access to a cluster with several thousand
> processors that can run the simulations in parallel, so I'm wondering if
> there are any algorithms out there that I can use to leverage this
> computing power to efficiently solve my problem
>

We have a similar problem at the moment. It consists of inverting a model
(i.e., find the model parameters that result in the smallest misfit between
observations and model output, under some assumptions as to how you combine
data & model output). The model typically has 100s of input variables, is
very nonlinear, and takes "a long time" to run. Usually, we need to invert
lots and lots of sets of observations. The model code is fortran (f2py-ed
for numpy goodness), and there's also a version that uses OpenMP to
parallelise some internal loops. Additionally, we took advantage of AD
techniques (eg Tapenade
<http://tapenade.inria.fr:8080/tapenade/index.jsp><tapenade.inria.fr:8080/tapenade/index.jsp>
)
to calculate the model's derivative with respect to its inputs (and also
calculated the derivative of how we put together the mismatch of obs & model
output, usually referred to as "cost function"). This was pretty hard, and I
wouldn't try it at home :) Then you can use fast optimisation methods
(L-BFG-S, for example). The next stage we have used is to parallelise runs
over a cluster using IPython's parallelisation capabilities. If you have
lots of independent model runs, you can parallelise these, or you can
parallelise experiments.

We looked at Gaussian Proces emulators too, as Giovanni suggested (see the
papers by O'Hagan too). However, the problem is that our model typically has
several outputs (think of it as correlated time series, for example, a time
series of the outflow of rivers in a basin). This isn't easy to do with GPs.
However, if your model provides a scalar, then they can be very efficient
and are easy to implement.

Finally, if you know pretty well how your model behaves and so on, you can
precalculate pairs of input parameters/output value and make a look up table
(LUT). Think of the LUT as a poor man's GP emulator (no uncertainty
estimates, no derivatives, etc). This is the sort of approach that is used
operationally in my field (remote sensing) to invert complex radiative
transfer models fast. i think using something like scipy's vector
quantisation would be fairly fast and straightforward.

Hth,

Jose
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20110629/b27d57b6/attachment.html>