Using global minimizer methods from `optimize.minimize` function

`optimize.minimize` offers a choice of many different methods for multivariate scalar minimisation. These methods are chosen using the `method` keyword. There are also different global minimisation routines that one can use (differential_evolution, basinhopping, dual_annealing, shgo). These minimisers have the same overall objective as `minimize`, just with a different approach to finding a minimum. The global minimiser routines are called individually, and are not accessible through the `minimize` function as different methods. A PR is open at https://github.com/scipy/scipy/pull/10778 which proposes to add a `differential-evolution` method to `minimize` that would permit this. This is a fairly straightforward change as the call interfaces are almost identical, and the problems are posed in similar ways. There are obviously pros and cons to this: Pros ------ - One could call any of the multivariate scalar minimizers through one function. - In user code this could simplify code significantly (code that offers all the different minimizers has to use if/elif constructs to call different functions depending on the method to be used). Cons ------- - A user may not appreciate the differences of how local and global minimisers work. e.g. a lot of the global minimisers are stochastic and some use local minimisers to polish the end solution. Could we have a discussion as to whether people think this is a good/bad idea? Would it confuse users, would it make `minimize` too convoluted, etc? A.

On Tue, Apr 28, 2020 at 6:53 PM Andrew Nelson <andyfaff@gmail.com> wrote:
`optimize.minimize` offers a choice of many different methods for multivariate scalar minimisation. These methods are chosen using the `method` keyword.
There are also different global minimisation routines that one can use (differential_evolution, basinhopping, dual_annealing, shgo). These minimisers have the same overall objective as `minimize`, just with a different approach to finding a minimum. The global minimiser routines are called individually, and are not accessible through the `minimize` function as different methods. A PR is open at https://github.com/scipy/scipy/pull/10778 which proposes to add a `differential-evolution` method to `minimize` that would permit this. This is a fairly straightforward change as the call interfaces are almost identical, and the problems are posed in similar ways.
There are obviously pros and cons to this:
Pros ------ - One could call any of the multivariate scalar minimizers through one function. - In user code this could simplify code significantly (code that offers all the different minimizers has to use if/elif constructs to call different functions depending on the method to be used).
I sort of think these pros are overstated. The dispatching of what function to call does not seem that difficult to do (either in `minimize` or in user code). The benefit of having that dispatch of function name happen within `minimize` is small. Normalizing the APIs so that the options sent to the underlying methods is harder and also more valuable. That is, in order for the dispatching to really be valuable, it has to unite and offer a translation layer to the calls to the underlying functions. The global solvers have many different optional arguments with little overlap in name or meaning. Like, 'popsize' is only used by 'differential_evolution'. The plan would have to be to silently ignore keyword arguments for concepts not used by the currently used method. I'm not sure that helps achieve clarity and simplicity. To use these methods, the user has to read the docs for the actual solver to get the many optional arguments set anyway. At that point, they can just as easily change the name of the function.
Cons ------- - A user may not appreciate the differences of how local and global minimisers work. e.g. a lot of the global minimisers are stochastic and some use local minimisers to polish the end solution.
Could we have a discussion as to whether people think this is a good/bad idea? Would it confuse users, would it make `minimize` too convoluted, etc?
I don't think the distinction between "local" and "global" is actually that important. Well, actually, I think the label "global" is kind of misleading, as most of these methods require bounds. What they do is try to avoid getting stuck in the first minima they find. But, I think there is another concern that may not have been expressed yet. `x0` is a required, positional argument for `minimize()`, as an array of initial parameter values. Most of the global optimizers in scipy.optimize do not use `x0`. Instead, they require bounds and explore the range of values between those bounds. Would `x0` be required AND ignored for these global optimizers? Cheers, --Matt

I sort of think these pros are overstated. The dispatching of what function to call does not seem that difficult to do (either in `minimize` or in user code). The benefit of having that dispatch of function name happen within `minimize` is small. Normalizing the APIs so that the options sent to the underlying methods is harder and also more valuable. That is, in order for the dispatching to really be valuable, it has to unite and offer a translation layer to the calls to the underlying functions.
A recent issue (https://github.com/scipy/scipy/issues/11956) highlighted this issue. Here the author wanted to use a constraints dict with differential_evolution, similar to that can be provided to `minimize` (de uses the new style `NonLinearConstraint`). If differential_evolution was a `minimize` method that translation would be done automatically. The same argument applies for translation of new style and old style bounds. Is that what you mean by normalising the API?
The global solvers have many different optional arguments with little overlap in name or meaning. Like, 'popsize' is only used by 'differential_evolution'. The plan would have to be to silently ignore keyword arguments for concepts not used by the currently used method. I'm not sure that helps achieve clarity and simplicity. To use these methods, the user has to read the docs for the actual solver to get the many optional arguments set anyway. At that point, they can just as easily change the name of the function.
There are many optional arguments for each of the methods as-is. The most common are jac and hess which are used across some, but not all methods. L-BFGS-B has `iprint`, `gtol`, `maxls` which aren't used by most other methods. Over supply/non-supply (i.e. concepts not used by specified method) of these keywords is already handled by minimize, and by the minimizers themselves (via `_check_unknown_options`). Your line of reasoning runs counter to the design of the `minimize` function, and would suggest a return to the old-style minimize functions: fmin, fmin_l_bfgs_b, etc. (The documentation states "The functions below are not recommended for use in new scripts; all of these methods are accessible via a newer, more consistent interfaces, provided by the interfaces above.") But, I think there is another concern that may not have been expressed
yet. `x0` is a required, positional argument for `minimize()`, as an array of initial parameter values. Most of the global optimizers in scipy.optimize do not use `x0`. Instead, they require bounds and explore the range of values between those bounds. Would `x0` be required AND ignored for these global optimizers?
The call signature for required positional arguments for `minimize` is different to the global optimizers. Being able to call the global minimizers via `minimize` would alleviate that. As you say the global minimizers do explore within bounds, and don't use an `x0`. The PR (as it currently exists) would still require `x0`, and it would be ignored. It would be possible to change that behaviour for `differential_evolution`, but would require the modification of the code to accept an `x0` keyword.

Just wanted to mention that I'm in favor of adding the option of accessing `differential_evolution` via minimize, and my comments are at the PR <https://github.com/scipy/scipy/pull/10778>. On Wed, Apr 29, 2020 at 7:43 PM Andrew Nelson <andyfaff@gmail.com> wrote:
I sort of think these pros are overstated. The dispatching of what
function to call does not seem that difficult to do (either in `minimize` or in user code). The benefit of having that dispatch of function name happen within `minimize` is small. Normalizing the APIs so that the options sent to the underlying methods is harder and also more valuable. That is, in order for the dispatching to really be valuable, it has to unite and offer a translation layer to the calls to the underlying functions.
A recent issue (https://github.com/scipy/scipy/issues/11956) highlighted this issue. Here the author wanted to use a constraints dict with differential_evolution, similar to that can be provided to `minimize` (de uses the new style `NonLinearConstraint`). If differential_evolution was a `minimize` method that translation would be done automatically. The same argument applies for translation of new style and old style bounds. Is that what you mean by normalising the API?
The global solvers have many different optional arguments with little overlap in name or meaning. Like, 'popsize' is only used by 'differential_evolution'. The plan would have to be to silently ignore keyword arguments for concepts not used by the currently used method. I'm not sure that helps achieve clarity and simplicity. To use these methods, the user has to read the docs for the actual solver to get the many optional arguments set anyway. At that point, they can just as easily change the name of the function.
There are many optional arguments for each of the methods as-is. The most common are jac and hess which are used across some, but not all methods. L-BFGS-B has `iprint`, `gtol`, `maxls` which aren't used by most other methods. Over supply/non-supply (i.e. concepts not used by specified method) of these keywords is already handled by minimize, and by the minimizers themselves (via `_check_unknown_options`). Your line of reasoning runs counter to the design of the `minimize` function, and would suggest a return to the old-style minimize functions: fmin, fmin_l_bfgs_b, etc. (The documentation states "The functions below are not recommended for use in new scripts; all of these methods are accessible via a newer, more consistent interfaces, provided by the interfaces above.")
But, I think there is another concern that may not have been expressed
yet. `x0` is a required, positional argument for `minimize()`, as an array of initial parameter values. Most of the global optimizers in scipy.optimize do not use `x0`. Instead, they require bounds and explore the range of values between those bounds. Would `x0` be required AND ignored for these global optimizers?
The call signature for required positional arguments for `minimize` is different to the global optimizers. Being able to call the global minimizers via `minimize` would alleviate that. As you say the global minimizers do explore within bounds, and don't use an `x0`. The PR (as it currently exists) would still require `x0`, and it would be ignored. It would be possible to change that behaviour for `differential_evolution`, but would require the modification of the code to accept an `x0` keyword.
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly

On Thu, Apr 30, 2020 at 4:43 AM Andrew Nelson <andyfaff@gmail.com> wrote:
But, I think there is another concern that may not have been expressed
yet. `x0` is a required, positional argument for `minimize()`, as an array of initial parameter values. Most of the global optimizers in scipy.optimize do not use `x0`. Instead, they require bounds and explore the range of values between those bounds. Would `x0` be required AND ignored for these global optimizers?
The call signature for required positional arguments for `minimize` is different to the global optimizers. Being able to call the global minimizers via `minimize` would alleviate that. As you say the global minimizers do explore within bounds, and don't use an `x0`. The PR (as it currently exists) would still require `x0`, and it would be ignored. It would be possible to change that behaviour for `differential_evolution`, but would require the modification of the code to accept an `x0` keyword.
Hmm, the signature issue seems quite problematic. The opposite is also true: `bounds` is optional for minimize(), and cannot be made non-optional, however for the global optimizers it is required. Ralf

On Thu, 30 Apr 2020 at 14:25, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Thu, Apr 30, 2020 at 4:43 AM Andrew Nelson <andyfaff@gmail.com> wrote:
But, I think there is another concern that may not have been expressed
yet. `x0` is a required, positional argument for `minimize()`, as an array of initial parameter values. Most of the global optimizers in scipy.optimize do not use `x0`. Instead, they require bounds and explore the range of values between those bounds. Would `x0` be required AND ignored for these global optimizers?
The call signature for required positional arguments for `minimize` is different to the global optimizers. Being able to call the global minimizers via `minimize` would alleviate that. As you say the global minimizers do explore within bounds, and don't use an `x0`. The PR (as it currently exists) would still require `x0`, and it would be ignored. It would be possible to change that behaviour for `differential_evolution`, but would require the modification of the code to accept an `x0` keyword.
Hmm, the signature issue seems quite problematic. The opposite is also true: `bounds` is optional for minimize(), and cannot be made non-optional, however for the global optimizers it is required.
The call would be: ``` minimize(func, x0, bounds=bounds, method='differential-evolution') ``` Inside `minimize` it would look something like: ``` res = differential_evolution(func, bounds) ``` If bounds is None then an error would be raised, either from the `minimize` function or the underlying `differential_evolution` function. At the moment the x0 supplied to minimize would be ignored by the underlying de function, but it would be possible to amend de to use x0 as well. With this proposal the minimize signature wouldn't change, neither would that of `differential_evolution` (with the possible exception of adding an `x0` keyword to use an initial position).

On Wed, Apr 29, 2020 at 9:44 PM Andrew Nelson <andyfaff@gmail.com> wrote:
I sort of think these pros are overstated. The dispatching of what
function to call does not seem that difficult to do (either in `minimize` or in user code). The benefit of having that dispatch of function name happen within `minimize` is small. Normalizing the APIs so that the options sent to the underlying methods is harder and also more valuable. That is, in order for the dispatching to really be valuable, it has to unite and offer a translation layer to the calls to the underlying functions.
A recent issue (https://github.com/scipy/scipy/issues/11956) highlighted this issue. Here the author wanted to use a constraints dict with differential_evolution, similar to that can be provided to `minimize` (de uses the new style `NonLinearConstraint`). If differential_evolution was a `minimize` method that translation would be done automatically. The same argument applies for translation of new style and old style bounds. Is that what you mean by normalising the API?
It would be? Or: how would it be without having it also work with the function `differential_evolution`.
The global solvers have many different optional arguments with little overlap in name or meaning. Like, 'popsize' is only used by 'differential_evolution'. The plan would have to be to silently ignore keyword arguments for concepts not used by the currently used method. I'm not sure that helps achieve clarity and simplicity. To use these methods, the user has to read the docs for the actual solver to get the many optional arguments set anyway. At that point, they can just as easily change the name of the function.
There are many optional arguments for each of the methods as-is. The most common are jac and hess which are used across some, but not all methods. L-BFGS-B has `iprint`, `gtol`, `maxls` which aren't used by most other methods. Over supply/non-supply (i.e. concepts not used by specified method) of these keywords is already handled by minimize, and by the minimizers themselves (via `_check_unknown_options`). Your line of reasoning runs counter to the design of the `minimize` function, and would suggest a return to the old-style minimize functions: fmin, fmin_l_bfgs_b, etc. (The documentation states "The functions below are not recommended for use in new scripts; all of these methods are accessible via a newer, more consistent interfaces, provided by the interfaces above.")
Yes, you are correct that my line of reasoning runs counter to the `minimize` function. Well, I would say not so much "counter" as seeing some value with `minimize`, but also seeing some value with the old style too. I am definitely not in favor of the documentation that claims working functions are deprecated in favor of a multi-dispatch function. Somehow, `minimize` and the deprecation of `fmin`, etc grates on me much less than the alleged deprecation for `leastsq` in favor of `least_squares`, which is a nest of interdependent options. `Minimize` does not seem as bad (but it appears that you're working on it ;)). My point is that the dispatching of the function names is not, by itself, really that big of a win. Functions are first-class objects. At least with most of the current solvers covered by `minimize`, most of the non-uniformly-named options are also truly optional (except, for some methods, `jac` -- but at least that is more or less a uniform name) and "advanced options for fine tuning". My sense is that the uniquely named arguments to the global solvers are more important, and are less like "advanced options".
But, I think there is another concern that may not have been expressed
yet. `x0` is a required, positional argument for `minimize()`, as an array of initial parameter values. Most of the global optimizers in scipy.optimize do not use `x0`. Instead, they require bounds and explore the range of values between those bounds. Would `x0` be required AND ignored for these global optimizers?
The call signature for required positional arguments for `minimize` is different to the global optimizers. Being able to call the global minimizers via `minimize` would alleviate that. As you say the global minimizers do explore within bounds, and don't use an `x0`. The PR (as it currently exists) would still require `x0`, and it would be ignored.
If I understand correctly, with the proposed changes, I hope you would have to continue supporting minimize(objective, x0, method='Nelder-Mead') to work as it currently does: x0 required. To switch to solve with `differential_evolution`, the user would have to do: minimize(objective, x0, bounds=bounds, method='differential_evolution') Now, although `bounds` is a keyword argument, it is actually required for the method to work. And `x0` is a required positional argument, but the value is ignored. That seems profoundly weird to me. Are there other examples in scipy (outside of scipy.optimize) for which a) a required, positional argument has a value that is ignored when an optional keyword argument has some value(s)? b) a keyword argument is changed from optional to required due to the value of a different keyword argument? Each of these seems like a problem to me. And, yes, b) is currently the case for `minimize`: some values of `method` require a `jac` option, while other values for `method` do not use `jac`. So, yes, I think the idea of putting all of `scipy.optimize` into a single function is an understandable desire but also sort of a design mistake. If important keyword arguments are different and not translatable, I don't see why minimize(objective, x0, method='solver', **kws) is actually better than solver(objective, x0, **kws) For sure, having uniform keyword arguments and using common infrastructure to, say, having a common way of setting bounds and constraints is valuable. Or (ahem), make Parameters objects that have names and bounds, etc.... Or if there was a Minimizer class with attributes and methods, maybe having a `method` attribute would make sense. Either way, if you're looking to improve the uniformity or the ability of downstream code to use the functions as if they were an API, then OptimizerResult really ought to include the name of the method used. Cheers, --Matt

My sense is that the uniquely named arguments to the global solvers are more important, and are less like "advanced options".
When I use the global minimisers I hardly change the tuning of the default options. But, I think there is another concern that may not have been expressed
yet. `x0` is a required, positional argument for `minimize()`, as an array of initial parameter values. Most of the global optimizers in scipy.optimize do not use `x0`. Instead, they require bounds and explore the range of values between those bounds. Would `x0` be required AND ignored for these global optimizers?
The call signature for required positional arguments for `minimize` is different to the global optimizers. Being able to call the global minimizers via `minimize` would alleviate that. As you say the global minimizers do explore within bounds, and don't use an `x0`. The PR (as it currently exists) would still require `x0`, and it would be ignored.
If I understand correctly, with the proposed changes, I hope you would have to continue supporting
minimize(objective, x0, method='Nelder-Mead')
You are correct. There would be no behavioural change of all the existing methods. minimize(objective, x0, bounds=bounds,
method='differential_evolution')
Now, although `bounds` is a keyword argument, it is actually required for the method to work. And `x0` is a required positional argument, but the value is ignored. That seems profoundly weird to me.
`bounds` would required for the method to work. As you mention this is no different to 'newton-cg' that requires 'jac' for that method to work. `x0` would still be a required positional argument for `minimize`. If would be ignored for the `differential-evolution` method. However, it is also possible to change the underlying `differential_evolution` function to use an initial `x0` guess. Those guesses are less important for the global minimisers.
Are there other examples in scipy (outside of scipy.optimize) for which
a) a required, positional argument has a value that is ignored when an optional keyword argument has some value(s)? b) a keyword argument is changed from optional to required due to the value of a different keyword argument?
I am less familiar with other areas of scipy. Either way, if you're looking to improve the uniformity or the ability of
downstream code to use the functions as if they were an API, then OptimizerResult really ought to include the name of the method used.
That's a good suggestion.
participants (4)
-
Andrew Nelson
-
Matt Haberland
-
Matt Newville
-
Ralf Gommers