Mailman 3 fmin_powell returns incorrect parameters for simple least-squares problem - SciPy-Dev

fmin_powell returns incorrect parameters for simple least-squares problem

older
multi-variable optimization using...

Matthew Brett

March 1, 2023

11:20 a.m.

Hi, Could I ask for your collective advice? While writing examples for students I found an instance of `fmin_powell` claiming success while returning an incorrect minimum, on a very simple least-squares problem. I've put up the reproducer in this repository: https://github.com/matthew-brett/powell-fails The take-home is that, for a simple least-squares problem, and ordinary-looking data, for a particular starting value, `fmin_powell` stops on a not-minimum value and claims success, where other optimizers do find the minimum, as does the Octave implementation. Here is the output from the reproducer: LS inter: 2.114798570871842 LS slope: 0.5088641205763367 LS SSE error: 1.0674470960898728 BGFS minimization: Optimization terminated successfully. Current function value: 1.067447 Iterations: 3 Function evaluations: 15 Gradient evaluations: 5 [2.11480099 0.50886336] Powell minimization: Optimization terminated successfully. Current function value: 1.074363 Iterations: 6 Function evaluations: 152 [2.22069108 0.47457665] As you can see, the Powell result has a higher sum of squared error, different parameters, and claims success. I get this for Scipy 1.10.1 on Mac M2, Intel, and Ubuntu 22.04 amd64. I notice that we have a "modified" Powell implementation. Can I ask for hints as to where to go next in exploring this problem? Could there be a flaw in our implementation? Cheers, Matthew

Show replies by date

David Menéndez Hurtado

March 2023

1:12 p.m.

[image: trajectory.png] This is the trajectory of each one plotted with the loss. It seems like Powell has a hard time navigating narrow valleys. BFGS, on the other hand, is locally fitting a parabola, which is a good fit for the loss function here. This is what I get if I instead feed noiseless data, further illustrating the fact that Powell doesn't like ravines. [image: traj_narrow.png] The code is here: https://gist.github.com/Dapid/1da960739b9e006f41a962607f6b1c54 I can't say why Scipy's fails and Octave doesn't, but hopefully this gives someone else an idea. /David. (PS, the images are 100 kB, I hope it is fine to send them on the list). On Wed, 1 Mar 2023 at 12:22, Matthew Brett <matthew.brett@gmail.com> wrote:

...

Hi,

Could I ask for your collective advice?

While writing examples for students I found an instance of `fmin_powell` claiming success while returning an incorrect minimum, on a very simple least-squares problem.

I've put up the reproducer in this repository:

https://github.com/matthew-brett/powell-fails

The take-home is that, for a simple least-squares problem, and ordinary-looking data, for a particular starting value, `fmin_powell` stops on a not-minimum value and claims success, where other optimizers do find the minimum, as does the Octave implementation.

Here is the output from the reproducer:

LS inter: 2.114798570871842 LS slope: 0.5088641205763367 LS SSE error: 1.0674470960898728

BGFS minimization: Optimization terminated successfully. Current function value: 1.067447 Iterations: 3 Function evaluations: 15 Gradient evaluations: 5 [2.11480099 0.50886336]

Powell minimization: Optimization terminated successfully. Current function value: 1.074363 Iterations: 6 Function evaluations: 152 [2.22069108 0.47457665]

As you can see, the Powell result has a higher sum of squared error, different parameters, and claims success. I get this for Scipy 1.10.1 on Mac M2, Intel, and Ubuntu 22.04 amd64.

I notice that we have a "modified" Powell implementation. Can I ask for hints as to where to go next in exploring this problem? Could there be a flaw in our implementation?

Cheers,

Matthew _______________________________________________ SciPy-Dev mailing list -- scipy-dev@python.org To unsubscribe send an email to scipy-dev-leave@python.org https://mail.python.org/mailman3/lists/scipy-dev.python.org/ Member address: davidmenhur@gmail.com

Matthew Brett

1:48 p.m.

Hi, On Wed, Mar 1, 2023 at 1:15 PM David Menéndez Hurtado <davidmenhur@gmail.com> wrote:

...

This is the trajectory of each one plotted with the loss. It seems like Powell has a hard time navigating narrow valleys. BFGS, on the other hand, is locally fitting a parabola, which is a good fit for the loss function here.

This is what I get if I instead feed noiseless data, further illustrating the fact that Powell doesn't like ravines.

The code is here: https://gist.github.com/Dapid/1da960739b9e006f41a962607f6b1c54

I can't say why Scipy's fails and Octave doesn't, but hopefully this gives someone else an idea.

Thanks very much for doing all that - and posting the Gist. Exploring a little further - I tried all 5 Powell-type implementations in the PGFO package, script in: https://github.com/matthew-brett/powell-fails/blob/main/powell_pdfo.py All implemented routines get the correct answer, except 'cobyla', which fails with a suitable message: """ message: Return from cobyla because the objective function has been evaluated maxfev times. method: cobyla nfev: 1000 status: 3 success: False x: [2.1950324 0.48383331] """ And, interestingly enough, our own COBYLA method gets the wrong answer on the same problem, but believes it has succeeded: """ nav] In [12]: spo.minimize(calc_sse, start, args=(x, y), method='COBYLA') Out[12]: fun: 1.0762214436895654 maxcv: 0.0 message: 'Optimization terminated successfully.' nfev: 30 status: 1 success: True x: array([2.24641976, 0.46774596]) """ Cheers, Matthew

Matthew Brett

3:29 p.m.

Hi, On Wed, Mar 1, 2023 at 1:48 PM Matthew Brett <matthew.brett@lis.ac.uk> wrote:

...

Hi,

On Wed, Mar 1, 2023 at 1:15 PM David Menéndez Hurtado <davidmenhur@gmail.com> wrote:

...
This is the trajectory of each one plotted with the loss. It seems like Powell has a hard time navigating narrow valleys. BFGS, on the other hand, is locally fitting a parabola, which is a good fit for the loss function here.

This is what I get if I instead feed noiseless data, further illustrating the fact that Powell doesn't like ravines.

The code is here: https://gist.github.com/Dapid/1da960739b9e006f41a962607f6b1c54

I can't say why Scipy's fails and Octave doesn't, but hopefully this gives someone else an idea.

Thanks very much for doing all that - and posting the Gist.

Exploring a little further - I tried all 5 Powell-type implementations in the PGFO package, script in:

https://github.com/matthew-brett/powell-fails/blob/main/powell_pdfo.py

All implemented routines get the correct answer, except 'cobyla', which fails with a suitable message:

""" message: Return from cobyla because the objective function has been evaluated maxfev times. method: cobyla nfev: 1000 status: 3 success: False x: [2.1950324 0.48383331] """

And, interestingly enough, our own COBYLA method gets the wrong answer on the same problem, but believes it has succeeded:

""" nav] In [12]: spo.minimize(calc_sse, start, args=(x, y), method='COBYLA') Out[12]: fun: 1.0762214436895654 maxcv: 0.0 message: 'Optimization terminated successfully.' nfev: 30 status: 1 success: True x: array([2.24641976, 0.46774596]) """

I think I'm right that we've had our own custom implementation of Powell for a very long time - it looks to me (from git blame) that Travis O put a version of the current code in with commit b94c30dcb (April 2002). The Powell search is failing for the data above as far back as I can conveniently test, Scipy 0.10.0, November 2011. Cheers, Matthew

Robert Kern

4:21 p.m.

On Wed, Mar 1, 2023 at 10:30 AM Matthew Brett <matthew.brett@lis.ac.uk> wrote:

...

I think I'm right that we've had our own custom implementation of Powell for a very long time - it looks to me (from git blame) that Travis O put a version of the current code in with commit b94c30dcb (April 2002).

For what it's worth, it's not so much "our own custom implementation" so much as following a particular published variant that was explained (but not originally published in) *Numerical Recipes*. Interestingly, that chapter claims that the modification is to assist Powell's method in handling long skinny valleys. -- Robert Kern

Matthew Brett

10:01 p.m.

Hi, On Wed, Mar 1, 2023 at 4:22 PM Robert Kern <robert.kern@gmail.com> wrote:

...

On Wed, Mar 1, 2023 at 10:30 AM Matthew Brett <matthew.brett@lis.ac.uk> wrote:

...
I think I'm right that we've had our own custom implementation of Powell for a very long time - it looks to me (from git blame) that Travis O put a version of the current code in with commit b94c30dcb (April 2002).

For what it's worth, it's not so much "our own custom implementation" so much as following a particular published variant that was explained (but not originally published in) Numerical Recipes. Interestingly, that chapter claims that the modification is to assist Powell's method in handling long skinny valleys.

Sorry - by "custom" I only meant that we didn't import the code directly from somewhere else. I guess, from checking http://numerical.recipes/book/book.html - that this is Acton's (1970, 1990) modification of the method, referenced as: Acton, F.S. 1970, Numerical Methods That Work; 1990, corrected edition (Washington, DC: Mathematical Association of America), pp. 464–467.[2] ? Is it really true that the modification assists Powell in handling long skinny valleys? If so, does it work less well in other situations? Has anyone compared the algorithms across a good range of problems? Should we also provide an unmodified Powell? Cheers, Matthew

Charles R Harris

11:32 p.m.

On Thu, Mar 2, 2023 at 3:02 PM Matthew Brett <matthew.brett@lis.ac.uk> wrote:

...

Hi,

On Wed, Mar 1, 2023 at 4:22 PM Robert Kern <robert.kern@gmail.com> wrote:

...
On Wed, Mar 1, 2023 at 10:30 AM Matthew Brett <matthew.brett@lis.ac.uk>

wrote:

...
...
I think I'm right that we've had our own custom implementation of Powell for a very long time - it looks to me (from git blame) that Travis O put a version of the current code in with commit b94c30dcb (April 2002).

For what it's worth, it's not so much "our own custom implementation" so much as following a particular published variant that was explained (but not originally published in) Numerical Recipes. Interestingly, that chapter claims that the modification is to assist Powell's method in handling long skinny valleys.

Sorry - by "custom" I only meant that we didn't import the code directly from somewhere else.

I guess, from checking http://numerical.recipes/book/book.html - that this is Acton's (1970, 1990) modification of the method, referenced as:

Acton, F.S. 1970, Numerical Methods That Work; 1990, corrected edition (Washington, DC: Mathematical Association of America), pp. 464–467.[2]

One of my favorites back in the day. Worth a read, if only for his entertaining comments on the practice of numerical methods and their occasional misapplication.

...

? Is it really true that the modification assists Powell in handling long skinny valleys? If so, does it work less well in other situations? Has anyone compared the algorithms across a good range of problems? Should we also provide an unmodified Powell?

Chuck

Matthew Brett

12:17 p.m.

Hi, Sorry to keep on at this one, but it is worrying me. Do we have anyone reading or known to us who can give a fairly definitive answer to: On Thu, Mar 2, 2023 at 10:01 PM Matthew Brett <matthew.brett@lis.ac.uk> wrote: [snip]

...

? Is it really true that the modification assists Powell in handling long skinny valleys? If so, does it work less well in other situations? Has anyone compared the algorithms across a good range of problems?

Cheers, Matthew

Sara Fridovich-Keil

11 p.m.

Hi, I haven’t tested Powell on these (or have time to now unfortunately), but I did previously add a test bench of optimization problems in https://github.com/scipy/scipy/pull/14579 <https://github.com/scipy/scipy/pull/14579> if it’s useful. Best, Sara

...

On Mar 6, 2023, at 4:17 AM, Matthew Brett <matthew.brett@lis.ac.uk> wrote:

Hi,

Sorry to keep on at this one, but it is worrying me.

Do we have anyone reading or known to us who can give a fairly definitive answer to:

On Thu, Mar 2, 2023 at 10:01 PM Matthew Brett <matthew.brett@lis.ac.uk> wrote: [snip]

...
? Is it really true that the modification assists Powell in handling long skinny valleys? If so, does it work less well in other situations? Has anyone compared the algorithms across a good range of problems?

Cheers,

Matthew _______________________________________________ SciPy-Dev mailing list -- scipy-dev@python.org To unsubscribe send an email to scipy-dev-leave@python.org https://mail.python.org/mailman3/lists/scipy-dev.python.org/ Member address: sarafridov@gmail.com

Christoph Schranz

6:32 p.m.

Hi all, Am 2023-03-09 00:00, schrieb Sara Fridovich-Keil:

...

Hi,

I haven't tested Powell on these (or have time to now unfortunately), but I did previously add a test bench of optimization problems in https://github.com/scipy/scipy/pull/14579 if it's useful.

Best, Sara

On Mar 6, 2023, at 4:17 AM, Matthew Brett <matthew.brett@lis.ac.uk> wrote:

Hi,

Sorry to keep on at this one, but it is worrying me.

Do we have anyone reading or known to us who can give a fairly definitive answer to:

On Thu, Mar 2, 2023 at 10:01 PM Matthew Brett <matthew.brett@lis.ac.uk> wrote: [snip] ? Is it really true that the modification assists Powell in handling long skinny valleys? If so, does it work less well in other situations? Has anyone compared the algorithms across a good range of problems? Cheers,

Matthew

I've tried to figure out what variant of the Powell algorithms is implemented. The first few evaluated points are: Powell minimization with scipy.fmin_powell: [2.25 0.47] [2.25 0.47] [3.25 0.47] [0.631966 0.47 ] [2.25 0.47] [1.63196603 0.47 ] [2.631966 0.47 ] [2.23860877 0.47 ] [2.23849486 0.47 ] [2.23872268 0.47 ] [2.23860877 0.47 ] [2.23860877 1.47 ] [ 2.23860877 -1.148034 ] [2.23860877 0.47 ] We can see that there are more than two changes of each parameter, therefore it can't be the COBYLA (constrained optimization by linear approximation). This pattern is pretty similar to the NEWUOA, however, this one was publicized in 2006 and this algorithm is from 2004 as someone wrote in a former response, right?

...

Michael J. D. Powell. The newuoa software for unconstrained optimization without derivatives. In In: Di Pillo G., Roma M. (eds) Large-Scale Nonlinear Optimization, volume 83, pages 1247-1293. Springer, Boston, MA, 2006.)

I guess that a kind of BOBYQA is implemented, it tries to create a local model based on the former evaluated points and then iteratively evaluates the minimum of this quadratic optimization. See this figure (from my master thesis about the NEWUOA algorithm, which is the like BOBYQA with additional improvements. In the subsequent path, I doubt that never all parameters are change at the same time. Similar in the code in https://github.com/scipy/scipy/blob/v1.10.1/scipy/optimize/_optimize.py#L334... it looks like there is iteratively a line search on one parameter - or on a linear combination given by the paramter direc. Therefore, BOBYQA and the scipy-implementation does not evaluate the minimum of the quadratic model, as depicted in the graph. It rather performs a parameter-wise quadratic optimization. I've made some tests to strengthen this idea, see my forked version from Mathew Brett: scipy_issue_powell/powell_debug_trace.py at main · ChristophSchranz/scipy_issue_powell (github.com) [1] 1. The standard version fails, as before. 2. The narrow valley doesn't work, as the step width is too high. Unfortunately, we can't change it, so I scaled the parameter space by the factor of 0.1 and then it works. In the standard version the valley is soo narrow, that each step is too far away and a step in this direction is not taken. 3. fmin_powell has the direc parameter, which expects a matrix of the size (N,N) that sets the initial step widths (in some kind that is not clear to me). If it is set to e.g. [[1,-0.5],[-0.5,1]], it investigates the (negative) interaction term between both parameters more and it works. 4. Your question was if it works for a horizontal valley. Yes, it does. I've rotated the parameter space such that the valley is nearly horizontal, and it works as expected. My interpretation is: The implementation of scipy is pretty old and doesn't incorporoate the changes from the NEWUOA algorihtm from 2006. This one is pretty successfull for non-noisy convex optimization problems.

...

Luis Miguel Rios and Nikolaos V. Sahinidis. Derivative-free optimization: a review of algorithms and comparison of software implementations. volume 53, pages 1247- 1293, 2012. Jorge J. Moré and Stefan M. Wild. Benchmarking derivative-free optimization algorithms. volume 20, pages 172-191, 200.

A narrow valley is problematic, if the step width in the direction made is larger than the valley. A step width parameter or the direc parameter (or scaling) can help I had problems with COBYLA and even more with NEWUOA (Quadratic optimization) for my noisy function to optimize. I think that the strength is on convex optimization problems. Best, Chris

...

...
_______________________________________________ SciPy-Dev mailing list -- scipy-dev@python.org To unsubscribe send an email to scipy-dev-leave@python.org https://mail.python.org/mailman3/lists/scipy-dev.python.org/ Member address: sarafridov@gmail.com

_______________________________________________ SciPy-Dev mailing list -- scipy-dev@python.org To unsubscribe send an email to scipy-dev-leave@python.org https://mail.python.org/mailman3/lists/scipy-dev.python.org/ Member address: christoph.schranz@salzburgresearch.at

Links: ------ [1] https://github.com/ChristophSchranz/scipy_issue_powell/blob/main/powell_debu...

Andrea Gavana

6:57 a.m.

Hi, Apologies for jumping in - with maybe pointless remarks - but one possible way to benchmark this behaviour may be to run the same problem with NLOpt (which also has NEWOUA, BOBYQA, COBYLA), and see if the outcome is similar. Andrea. On Fri, 10 Mar 2023 at 00.20, Christoph Schranz < christoph.schranz@salzburgresearch.at> wrote:

...

Hi all,

Am 2023-03-09 00:00, schrieb Sara Fridovich-Keil:

Hi,

I haven't tested Powell on these (or have time to now unfortunately), but I did previously add a test bench of optimization problems in https://github.com/scipy/scipy/pull/14579 if it's useful.

Best, Sara

On Mar 6, 2023, at 4:17 AM, Matthew Brett <matthew.brett@lis.ac.uk> wrote:

Hi,

Sorry to keep on at this one, but it is worrying me.

Do we have anyone reading or known to us who can give a fairly definitive answer to:

On Thu, Mar 2, 2023 at 10:01 PM Matthew Brett <matthew.brett@lis.ac.uk> wrote: [snip]

? Is it really true that the modification assists Powell in handling long skinny valleys? If so, does it work less well in other situations? Has anyone compared the algorithms across a good range of problems?

Cheers,

Matthew

I've tried to figure out what variant of the Powell algorithms is implemented. The first few evaluated points are: Powell minimization with scipy.fmin_powell: [2.25 0.47] [2.25 0.47] [3.25 0.47] [0.631966 0.47 ] [2.25 0.47] [1.63196603 0.47 ] [2.631966 0.47 ] [2.23860877 0.47 ] [2.23849486 0.47 ] [2.23872268 0.47 ] [2.23860877 0.47 ] [2.23860877 1.47 ] [ 2.23860877 -1.148034 ] [2.23860877 0.47 ]

We can see that there are more than two changes of each parameter, therefore it can't be the COBYLA (constrained optimization by linear approximation). This pattern is pretty similar to the NEWUOA, however, this one was publicized in 2006 and this algorithm is from 2004 as someone wrote in a former response, right?

Michael J. D. Powell. The newuoa software for unconstrained optimization without derivatives. In In: Di Pillo G., Roma M. (eds) Large-Scale Nonlinear Optimization, volume 83, pages 1247–1293. Springer, Boston, MA, 2006.)

I guess that a kind of BOBYQA is implemented, it tries to create a local model based on the former evaluated points and then iteratively evaluates the minimum of this quadratic optimization. See this figure (from my master thesis about the NEWUOA algorithm, which is the like BOBYQA with additional improvements.

In the subsequent path, I doubt that never all parameters are change at the same time. Similar in the code in https://github.com/scipy/scipy/blob/v1.10.1/scipy/optimize/_optimize.py#L334... it looks like there is iteratively a line search on one parameter - or on a linear combination given by the paramter direc. Therefore, BOBYQA and the scipy-implementation does not evaluate the minimum of the quadratic model, as depicted in the graph.

It rather performs a parameter-wise quadratic optimization.

I've made some tests to strengthen this idea, see my forked version from Mathew Brett: scipy_issue_powell/powell_debug_trace.py at main · ChristophSchranz/scipy_issue_powell (github.com) <https://github.com/ChristophSchranz/scipy_issue_powell/blob/main/powell_debu...>

1. The standard version fails, as before. 2. The narrow valley doesn't work, as the step width is too high. Unfortunately, we can't change it, so I scaled the parameter space by the factor of 0.1 and then it works. In the standard version the valley is soo narrow, that each step is too far away and a step in this direction is not taken. 3. fmin_powell has the direc parameter, which expects a matrix of the size (N,N) that sets the initial step widths (in some kind that is not clear to me). If it is set to e.g. [[1,-0.5],[-0.5,1]], it investigates the (negative) interaction term between both parameters more and it works. 4. Your question was if it works for a horizontal valley. Yes, it does. I've rotated the parameter space such that the valley is nearly horizontal, and it works as expected.

My interpretation is:

The implementation of scipy is pretty old and doesn't incorporoate the changes from the NEWUOA algorihtm from 2006. This one is pretty successfull for non-noisy convex optimization problems.

Luis Miguel Rios and Nikolaos V. Sahinidis. Derivative-free optimization: a review of algorithms and comparison of software implementations. volume 53, pages 1247– 1293, 2012. Jorge J. Moré and Stefan M. Wild. Benchmarking derivative-free optimization algorithms. volume 20, pages 172–191, 200.

A narrow valley is problematic, if the step width in the direction made is larger than the valley. A step width parameter or the direc parameter (or scaling) can help

I had problems with COBYLA and even more with NEWUOA (Quadratic optimization) for my noisy function to optimize. I think that the strength is on convex optimization problems.

Best, Chris

_______________________________________________ SciPy-Dev mailing list -- scipy-dev@python.org To unsubscribe send an email to scipy-dev-leave@python.org https://mail.python.org/mailman3/lists/scipy-dev.python.org/ Member address: sarafridov@gmail.com

_______________________________________________ SciPy-Dev mailing list -- scipy-dev@python.org To unsubscribe send an email to scipy-dev-leave@python.org https://mail.python.org/mailman3/lists/scipy-dev.python.org/ Member address: christoph.schranz@salzburgresearch.at

_______________________________________________ SciPy-Dev mailing list -- scipy-dev@python.org To unsubscribe send an email to scipy-dev-leave@python.org https://mail.python.org/mailman3/lists/scipy-dev.python.org/ Member address: andrea.gavana@gmail.com

709

Age (days ago)

718

Last active (days ago)

List overview

Download

10 comments

8 participants

participants (8)

Andrea Gavana
Charles R Harris
Christoph Schranz
David Menéndez Hurtado
Matthew Brett
Matthew Brett
Robert Kern
Sara Fridovich-Keil

fmin_powell returns incorrect parameters for simple least-squares problem

Matthew Brett

David Menéndez Hurtado

Matthew Brett

Matthew Brett

Robert Kern

Matthew Brett

Charles R Harris

Matthew Brett

Sara Fridovich-Keil

Christoph Schranz

Andrea Gavana

tags

participants (8)