[SciPy-User] Vectorizing scipy.optimize.curve_fit

Paul Hobson pmhobson at gmail.com
Mon Feb 4 00:13:23 EST 2013


On Sun, Feb 3, 2013 at 10:47 AM, <josef.pktd at gmail.com> wrote:

> On Sun, Feb 3, 2013 at 1:29 PM, Paul Hobson <pmhobson at gmail.com> wrote:
> >
> > On Sun, Feb 3, 2013 at 4:28 AM, <josef.pktd at gmail.com> wrote:
> >>
> >> On Sat, Feb 2, 2013 at 9:38 PM, Ryan Nelson <rnelsonchem at gmail.com>
> wrote:
> >> > Hi Paul,
> >> >
> >> > I've played around with this for scipy.optimize.leastsq, and I've
> never
> >> > been
> >> > successful. I'm not an expert, but I don't think the original code was
> >> > set
> >> > up for that type of usage. (Just speculating here... An equally likely
> >> > explanation is that I'm not smart enough to figure it out :)
> >> >
> >> > What I've found really useful for this type of problem is IPython's
> new
> >> > parallel architecture (IPython version 0.13.1). It's easy to set up an
> >> > IPython cluster and get things running in parallel through the
> notebook
> >> > interface, so I've attached a simple notebook that does a trivial
> >> > fitting
> >> > (using leastsq) of some noisy data. Before you run the notebook,
> you'll
> >> > need
> >> > to set up a cluster through the Cluster tab in the IPython notebook
> >> > dashboard.
> >> >
> >> > Unfortunately, in my experience I've found that the speed improvement
> is
> >> > only noticeable until the number of IPython engines in your cluster
> >> > equals
> >> > the number of cores not the number of processor threads. (This might
> be
> >> > obvious for those in the know, but it took me a while to figure out.)
> >> > But
> >> > every little bit of improvement helps, I suppose.
> >> >
> >> > Ryan
> >> >
> >> >
> >> > On 2/1/2013 6:07 PM, Paul Hobson wrote:
> >> >
> >> > Hey folks,
> >> >
> >> > I've run  into a bit of a roadblock. I've got some model runs (x) in
> an
> >> > Nx2
> >> > array where the first column is the input, and the second column is
> the
> >> > output. So in a single case, I'd do:
> >> >
> >> > popt, pcov = scipy.optimize.curve_fit(myFit, x[:,0], x[:,1])
> >> >
> >> > But how should I handle, say, 5000 model runs such that x.shape =
> (500,
> >> > N,
> >> > 2) and I want the 5000 results for popt?
> >> >
> >> > This works:
> >> > popt_array = np.empty(5000, 2)
> >> > for r, layer in enumerate(model_runs):
> >> >     popt, pcov = scipy.optimize.curve_fit(myFit, layer[:,0]
> layer[:,1])
> >> >     popt_array[r] = popt
> >> >
> >> > But is there a better (faster way)? The number of model runs and data
> >> > points
> >> > may grow sustatially (~10^4 runs and 10^3 data points).
> >>
> >> I think there is no efficient way to avoid the loop.
> >>
> >> besides going parallel:
> >>
> >> If you are only interested in popt, then using optimize.leastsq
> >> directly will save some calculations.
> >>
> >> The other major speedup for this kind of problems is to find good
> >> starting values. For example, if the solutions in the sequence are
> >> close to each other, then using previous solutions as starting values
> >> for the next case will speed up convergence.
> >> IIRC, in statsmodels we use an average of previous solutions, after
> >> some initial warm-up, in a similar problem. A moving average could
> >> also be useful.
> >>
> >> Josef
> >>
> >> >
> >> > Thanks,
> >> > -paul
> >>
> >
> > Thanks for the advice, Ryan and Josef. I've been meaning to get a feel of
> > for IPython Parallel for some time now, so this will be a good impetus
> to do
> > so.
>
> If you just want to run it on multiple cores, then joblib is easier.
> scikit-learn uses it extensively, and statsmodels uses it at the few
> parts, especially for bootstrap.
>
> >
> > Also, Josef, the suggestion about using optimize.leastsq with an initial
> > guess, will be a very good one, I think. A better description of what I'm
> > doing is actually bootstrapping the regression parameters to find their
> 95%
> > confidence intervals. So it make a lot sense, really, to get the initial
> > guess at the parameters from the full data set for use in the bootstrap
> > iterations.
>
> The only problems I would worry a bit about in this case are the
> possible presence of multiple local minima, if there are any, and
> convergence failures.
>
> Josef
>
>
Thanks for the advice. Currently working with linear fits, but I'm hoping
to expand this work to be more general. I'll be sure to keep these things
in mind.
-p
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20130203/72e66728/attachment.html>


More information about the SciPy-User mailing list