[SciPy-User] Least-squares fittings with bounds: why is scipy not up to the task?

Fri Mar 9 01:20:37 EST 2012

A response to Gael:

I don't think the problem is just a question of motivation/effort being put
in by people who are asking for new features.  Perhaps I'm overly
optimistic, but I think that most people are aware of the effort put in by
very busy people to put together scipy and are rather grateful that the
tool exists.  Many of us would like to see python adopted more in our
organizations and find that some tools that our users are used to are not
available and would like to see them made available.   However, I think
that the barrier to inclusion in scipy seems high.

A bit of history:

We also used the IDL version of  mpfit--in moving to python, I looked for
an analogue and found mpfit.py that at the time, relied on Numeric.   I
made a port to numpy and found Sergei Koposov had made a similar port (
http://code.google.com/p/astrolibpy/wiki/AstrolibpyContents) in addition to
fixing some bugs in the original and adding extensions.   I talked with all
of the stakeholders to receive licensing permission for inclusion into
scipy.   However, there were questions about the coding style, and it never
made it in.  (I'm happy to see the lmfit project on github)

Also, related to bounds, in the past, the scipy implementation of simulated
annealing would wander outside of bounds (which is not my expected
behavior--or that of many people), so I made a patch, which if a guess
would place the next point out of bounds, would stay at the same spot and
guess again (it may not be optimal, but it preserves ergodicity).   I was
told that if someone actually wanted the function to stay in bounds, they
could add a penalty function.

The end result:  I have my own version of anneal.py and mpfit.py.   I would
like to contribute.  I have students that work on things that might be of
general use.  However, the process needs to be more streamlined if the
community wants more participation.   Sympy is a good example of this--if
you have something that seems useful, they are very happy to take it--and
clean it up later (or help you to).  I've only watched sckit-learn from a
far, but http://scikit-learn.org/stable/developers/index.html seems to
provide rather clear instructions for contributing...

Take the question of bounds for example--is it better to have no easy way
of implementing bounds, or to have the cleanest/most efficient piece of
code?  What is the actual process of contributing these days?   For
example, for making a patch, now that the codebase is on github.  Do we
make a fork, patch, and point to the fork?  Submit a patch?  If so, where?

What exactly are scikits?  What determines if something belongs in
scipy.optimize as compared to a scikit?  What is the process for creating a
scikit?  The webpage is a bit vague.   Do scikits share more than a
namespace?

Sorry that this is a bit disorganized, but the TL;DR is that I think scipy
could do more to make it easier for people to contribute...I understand the
need to have maintainable code in a large project, but in many cases,
having a less than perfect implementation (with tests) would be better than
having no implementation...Also, what may be easy for us, may not be easy
to many users of scipy, so having convenience methods is worthwhile...

Best,
William

On Thu, Mar 8, 2012 at 10:14 PM, David Baddeley <david_baddeley at yahoo.com.au
> wrote:

> From a pure performance perspective, you're probably going to be best
> setting your bounds by variable substitution (particularly if they're only
> single-ended - x**2 is cheap) - you really don't want to have the for
> loops, dictionary lookups and conditionals that lmfit introduces for it's
> bounds checking inside your objective function.
>
> I think a high level wrapper that permitted bounds, an unadulterated goal
> function, and setting which parameters to fit, but also retained much of
> the raw speed of leastsq could be accomplished with some clever on the fly
> code generation (maybe also using Sympy to automatically derive the
> Jacobian). Would make an interesting project ...
>
> David
>
>   ------------------------------
> *From:* Eric Emsellem <eemselle at eso.org>
> *To:* Matthew Newville <matt.newville at gmail.com>
> *Cc:* scipy-user at scipy.org; scipy-user at googlegroups.com
> *Sent:* Friday, 9 March 2012 12:17 PM
>
> *Subject:* Re: [SciPy-User] Least-squares fittings with bounds: why is
> scipy not up to the task?
>
>
>
> > Yes, see https://github.com/newville/lmfit-py,  which does everything
> > you ask for, and a bit more, with the possible exception of "being
> > included in scipy".  For what its worth, I work with Mark Rivers
> > (who's no longer actively developing Python), and our group is full of
> > IDL users who are very familiar with Markwardt's implementation.
> >
> > The lmfit-py version uses scipy.optimize.leastsq(), which uses MINPACK
> > directly, so has the advantage of not being implemented in pure IDL or
> > Python. It is definitely faster than mpfit.py.
> >
> > With lmfit-py, one writes a python function-to-minimize that takes a
> > list of Parameters instead of the array of floating point variables
> > that scipy.optimize.leastsq() uses. Each Parameter can be freely
> > varied of fixed, have upper and/or lower bounds placed on them, or be
> > written as algebraic expressions of other Parameters.  Uncertainties
> > in varied Parameters and correlations between Parameters are estimated
> > using the same "scaled covariance" method as used in
> > scipy.optimize.curve_fit().  There is limited support for
> > optimization methods other than scipy.optimize.leastsq(), but I don't
> > find these methods to be very useful for the kind of fitting  problems
> > I normally see, so support for them may not be perfect.
> >
> > Whether this gets included into scipy is up to the scipy developers.
> > I'd be happy to support this module within scipy or outside scipy.
> > I have no doubt that improvements could be made to lmfit.py.  If you
> > have suggestion, I'd be happy to hear them.
>
> looks great! I'll have a go at this, as mentioned in my previous post. I
> believe that leastsq is probably the fastest anyway (according to the
> test Adam mentioned to me today) so this could be it. I'll make a test
> and compare it with mpfit (for the specific case I am thinking of, I am
> optimising over ~10^5-6 points with ~90 parameters...).
>
> thanks again for this, and I'll try to report on this (if relevant) asap.
>
> Eric
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20120309/dba5914c/attachment.html>