Adding a multistart algorithm to global optimizers in scipy.optimize
Hello everyone, in GitHub issue #6381<https://github.com/scipy/scipy/issues/6381> I suggested to add a global optimizer algorithm to scipy.optimize. As @andyfaff pointed out I should rather started the discussion here. As I already have some feedback by @andyfaff, I'll copy-paste my idea and his reply here. I'll also add my comments on his reply below. My issue: I regularly come across the following task: To find a the global minimum of a function I run local optimizers from multiple random initial guesses. For some situations, e.g. fitting parameters in systems biology, it was found that this type of algorithm performs better than others (Raue et al., 2013<http://dx.doi.org/10.1371/journal.pone.0074335>). I wonder if there is any interest to include such an algorithm into scipy. I wrote some code, documentation and 2 examples here: http://nbviewer.jupyter.org/github/fabianrost84/multistart-minimize/blob/mas... @andyfaff's reply: This is possibly something that would've been better to raise on the mailing list first. There are several approaches to the type of problem you described. Firstly (and probably best), you can use one of the global optimizers, optimize.basinhopping or optimize.differential_evolution, they are designed for this very task. They will be a lot more effective than brute search followed by local minimisation as the dimensionality of the problem increases. Remember that you can do brute search and polish using the optimize.brute, although the grid search is done over a regular grid. Finally, and this is not well known, you can do a brute search and polish (of the best solution) with differential_evolution. Here the brute search would examine the problem space by latin hypercube<https://en.wikipedia.org/wiki/Latin_hypercube_sampling> sampling. You would probably want to increase the popsize by a bit. By setting maxiter to 0 no evolution is done on the population, and the best solution from the hypercube sampling would be polished with L-BFGS-B. I do appreciate that isn't the same as performing a local minimisation of all the brute search locations, but I don't think that is going to outperform differential_evolution or basinhopping in the first place. basinhopping does do a local minimisation step as part of it's operation anyway. my reply to that: Thanks very much for your comments! You point out that basinhopping or differential evolution will be a lot more effective than and that they probably will outperform a multistart algorithm. Concerning that I'd like to point to the paper Raue et al., 2013<http://dx.doi.org/10.1371/journal.pone.0074335> once more. One of there problems is 11-dimensional. They compare the efficiency of multiple optimizers among them differential evolution and a multistart optimizer. I think they do not include basinhooping or simulated annealing. They find that their multistart optimizer outperform all other tested optimizers for the problem at hand. I should also mention they perform there study using MatLab, not Python. I found another study performed in Mathematica<http://library.wolfram.com/infocenter/Conferences/5824/#downloads>. This study finds optimization problems for which a multistart algorithm gives better results than simulated annealing or differential evolution. So doesn't that mean your point concerning effectiveness does not hold in general? I was aware of brute, I was not aware of the brute search with differential_evolution. Thanks. Let me mention that the latin hypercube sampling points could of course also be used as initials for multistart. That's what's done by Raue et al., 2013<http://dx.doi.org/10.1371/journal.pone.0074335> and it's probably better than my simple random uniform implementation. Let me also mention that a multistart algorithms are available in Excel's solver add-on<http://www.solver.com/excel-solver-multistart-methods-global-optimization>, in MatLab<https://www.mathworks.com/help/gads/multistart-class.html?s_tid=gn_loc_drop>, in R<http://www.inside-r.org/packages/cran/BB/docs/multiStart> and in Mathematica<https://www.wolfram.com/products/applications/mathoptpro/>, but not for scipy. Best, Fabian -- -------------------------------- Fabian Rost (Dipl.-Phys.) Wissenschaftlicher Mitarbeiter Technische Universität Dresden Centre for Information Services and High Performance Computing (ZIH) Dept. for Innovative Methods of Computing (IMC) 01062 Dresden Tel.: +49 (351) 463-38780 Fax: +49 (351) 463-38245 E-Mail: fabian.rost@tu-dresden.de Web: http://imc.zih.tu-dresden.de/ --------------------------------
So, their lack of comparison with basinhopping is unfortunate, because what you're describing is basically what basinhopping is for: it starts a local optimizer at a point, runs it downhill, then jumps to another starting point to see if it does better. So your algorithm is roughly equivalent to basinhopping with the "step" function being "choose a new point completely at random". But in fact for basinhopping or most other global optimizers, it's a good habit to run many starting points anyway, which can be done in parallel. So in this setting, basinhopping shifts some of the work from completely random starting points to those near other minima. You can always set the maximum number of iterations to 1. It wouldn't hurt to set up a convenient wrapper that starts many global optimizations in parallel at random points; they need not even all use the same algorithm, and accepting an Executor object (from concurrent.futures) allows you to achieve parallelism using whatever underlying implementation the user prefers: thread-based, process-based, or in fact ipython provides an executor that distributes work across the ipython cluster. The only headache is that concurrent.futures is not standard in python2, although there is a backport. Somewhere I have a trivial shim of a SerialExecutor that lets you use the interface without needing concurrent.futures as long ass you don't actually want parallelism. Anne On Mon, Jul 18, 2016 at 11:07 AM Fabian Rost <fabian.rost@tu-dresden.de> wrote:
Hello everyone,
in GitHub issue #6381 <https://github.com/scipy/scipy/issues/6381> I suggested to add a global optimizer algorithm to scipy.optimize. As @andyfaff pointed out I should rather started the discussion here. As I already have some feedback by @andyfaff, I'll copy-paste my idea and his reply here. I'll also add my comments on his reply below.
My issue:
I regularly come across the following task: To find a the global minimum of a function I run local optimizers from multiple random initial guesses. For some situations, e.g. fitting parameters in systems biology, it was found that this type of algorithm performs better than others (Raue et al., 2013 <http://dx.doi.org/10.1371/journal.pone.0074335>). I wonder if there is any interest to include such an algorithm into scipy. I wrote some code, documentation and 2 examples here:
http://nbviewer.jupyter.org/github/fabianrost84/multistart-minimize/blob/mas...
@andyfaff's reply:
This is possibly something that would've been better to raise on the mailing list first. There are several approaches to the type of problem you described. Firstly (and probably best), you can use one of the global optimizers, optimize.basinhopping or optimize.differential_evolution, they are designed for this very task. They will be a lot more effective than brute search followed by local minimisation as the dimensionality of the problem increases. Remember that you can do brute search and polish using the optimize.brute, although the grid search is done over a regular grid. Finally, and this is not well known, you can do a brute search and polish (of the best solution) with differential_evolution. Here the brute search would examine the problem space by latin hypercube <https://en.wikipedia.org/wiki/Latin_hypercube_sampling> sampling. You would probably want to increase the popsize by a bit. By setting maxiter to 0 no evolution is done on the population, and the best solution from the hypercube sampling would be polished with L-BFGS-B.
I do appreciate that isn't the same as performing a local minimisation of all the brute search locations, but I don't think that is going to outperform differential_evolution or basinhopping in the first place. basinhopping does do a local minimisation step as part of it's operation anyway.
my reply to that:
Thanks very much for your comments!
You point out that basinhopping or differential evolution will be a lot more effective than and that they probably will outperform a multistart algorithm.
Concerning that I'd like to point to the paper Raue et al., 2013 <http://dx.doi.org/10.1371/journal.pone.0074335> once more. One of there problems is 11-dimensional. They compare the efficiency of multiple optimizers among them differential evolution and a multistart optimizer. I think they do not include basinhooping or simulated annealing. They find that their multistart optimizer outperform all other tested optimizers for the problem at hand. I should also mention they perform there study using MatLab, not Python. I found another study performed in Mathematica <http://library.wolfram.com/infocenter/Conferences/5824/#downloads>. This study finds optimization problems for which a multistart algorithm gives better results than simulated annealing or differential evolution. So doesn't that mean your point concerning effectiveness does not hold in general?
I was aware of brute, I was not aware of the brute search with differential_evolution. Thanks. Let me mention that the latin hypercube sampling points could of course also be used as initials for multistart. That's what's done by Raue et al., 2013 <http://dx.doi.org/10.1371/journal.pone.0074335> and it's probably better than my simple random uniform implementation.
Let me also mention that a multistart algorithms are available in Excel's solver add-on <http://www.solver.com/excel-solver-multistart-methods-global-optimization>, in MatLab <https://www.mathworks.com/help/gads/multistart-class.html?s_tid=gn_loc_drop>, in R <http://www.inside-r.org/packages/cran/BB/docs/multiStart> and in Mathematica <https://www.wolfram.com/products/applications/mathoptpro/>, but not for scipy.
Best,
Fabian
-- -------------------------------- Fabian Rost (Dipl.-Phys.) Wissenschaftlicher Mitarbeiter Technische Universität Dresden Centre for Information Services and High Performance Computing (ZIH) Dept. for Innovative Methods of Computing (IMC) 01062 Dresden Tel.: +49 (351) 463-38780 Fax: +49 (351) 463-38245 E-Mail: fabian.rost@tu-dresden.de Web: http://imc.zih.tu-dresden.de/ -------------------------------- _______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev
Actually, a wrapper that starts many optimizations at random points would not only work for global optimizers but also for local optimizers. However, this wrapper would be limited to optimizers that use an initial guess (optimize.brute and differential_evolution would be excluded). Furthermore, this wrapper would turn any local optimizer into a global optimizer. In fact, the code I suggested here<http://nbviewer.jupyter.org/github/fabianrost84/multistart-minimize/blob/mas...> is just a wrapper around optimize.minimize. I realized that the documentation of optimize.basinhopping suggests that "the algorithm can be run from a number of different random starting points". So a convenient wrapper to do this definitely would not hurt. I wonder what this wrapper should return. It could return just the global optimum. But for analysis it might also be good to get all the OptimizeResults. This could help to judge how many local optima there are and if more or less initial guesses are needed to find the global optimum. Parallelization is certainly a good idea. Fabian ________________________________ Von: SciPy-Dev <scipy-dev-bounces@scipy.org> im Auftrag von Anne Archibald <peridot.faceted@gmail.com> Gesendet: Montag, 18. Juli 2016 13:31 An: SciPy Developers List Betreff: Re: [SciPy-Dev] Adding a multistart algorithm to global optimizers in scipy.optimize So, their lack of comparison with basinhopping is unfortunate, because what you're describing is basically what basinhopping is for: it starts a local optimizer at a point, runs it downhill, then jumps to another starting point to see if it does better. So your algorithm is roughly equivalent to basinhopping with the "step" function being "choose a new point completely at random". But in fact for basinhopping or most other global optimizers, it's a good habit to run many starting points anyway, which can be done in parallel. So in this setting, basinhopping shifts some of the work from completely random starting points to those near other minima. You can always set the maximum number of iterations to 1. It wouldn't hurt to set up a convenient wrapper that starts many global optimizations in parallel at random points; they need not even all use the same algorithm, and accepting an Executor object (from concurrent.futures) allows you to achieve parallelism using whatever underlying implementation the user prefers: thread-based, process-based, or in fact ipython provides an executor that distributes work across the ipython cluster. The only headache is that concurrent.futures is not standard in python2, although there is a backport. Somewhere I have a trivial shim of a SerialExecutor that lets you use the interface without needing concurrent.futures as long ass you don't actually want parallelism. Anne On Mon, Jul 18, 2016 at 11:07 AM Fabian Rost <fabian.rost@tu-dresden.de<mailto:fabian.rost@tu-dresden.de>> wrote: Hello everyone, in GitHub issue #6381<https://github.com/scipy/scipy/issues/6381> I suggested to add a global optimizer algorithm to scipy.optimize. As @andyfaff pointed out I should rather started the discussion here. As I already have some feedback by @andyfaff, I'll copy-paste my idea and his reply here. I'll also add my comments on his reply below. My issue: I regularly come across the following task: To find a the global minimum of a function I run local optimizers from multiple random initial guesses. For some situations, e.g. fitting parameters in systems biology, it was found that this type of algorithm performs better than others (Raue et al., 2013<http://dx.doi.org/10.1371/journal.pone.0074335>). I wonder if there is any interest to include such an algorithm into scipy. I wrote some code, documentation and 2 examples here: http://nbviewer.jupyter.org/github/fabianrost84/multistart-minimize/blob/mas... @andyfaff's reply: This is possibly something that would've been better to raise on the mailing list first. There are several approaches to the type of problem you described. Firstly (and probably best), you can use one of the global optimizers, optimize.basinhopping or optimize.differential_evolution, they are designed for this very task. They will be a lot more effective than brute search followed by local minimisation as the dimensionality of the problem increases. Remember that you can do brute search and polish using the optimize.brute, although the grid search is done over a regular grid. Finally, and this is not well known, you can do a brute search and polish (of the best solution) with differential_evolution. Here the brute search would examine the problem space by latin hypercube<https://en.wikipedia.org/wiki/Latin_hypercube_sampling> sampling. You would probably want to increase the popsize by a bit. By setting maxiter to 0 no evolution is done on the population, and the best solution from the hypercube sampling would be polished with L-BFGS-B. I do appreciate that isn't the same as performing a local minimisation of all the brute search locations, but I don't think that is going to outperform differential_evolution or basinhopping in the first place. basinhopping does do a local minimisation step as part of it's operation anyway. my reply to that: Thanks very much for your comments! You point out that basinhopping or differential evolution will be a lot more effective than and that they probably will outperform a multistart algorithm. Concerning that I'd like to point to the paper Raue et al., 2013<http://dx.doi.org/10.1371/journal.pone.0074335> once more. One of there problems is 11-dimensional. They compare the efficiency of multiple optimizers among them differential evolution and a multistart optimizer. I think they do not include basinhooping or simulated annealing. They find that their multistart optimizer outperform all other tested optimizers for the problem at hand. I should also mention they perform there study using MatLab, not Python. I found another study performed in Mathematica<http://library.wolfram.com/infocenter/Conferences/5824/#downloads>. This study finds optimization problems for which a multistart algorithm gives better results than simulated annealing or differential evolution. So doesn't that mean your point concerning effectiveness does not hold in general? I was aware of brute, I was not aware of the brute search with differential_evolution. Thanks. Let me mention that the latin hypercube sampling points could of course also be used as initials for multistart. That's what's done by Raue et al., 2013<http://dx.doi.org/10.1371/journal.pone.0074335> and it's probably better than my simple random uniform implementation. Let me also mention that a multistart algorithms are available in Excel's solver add-on<http://www.solver.com/excel-solver-multistart-methods-global-optimization>, in MatLab<https://www.mathworks.com/help/gads/multistart-class.html?s_tid=gn_loc_drop>, in R<http://www.inside-r.org/packages/cran/BB/docs/multiStart> and in Mathematica<https://www.wolfram.com/products/applications/mathoptpro/>, but not for scipy. Best, Fabian -- -------------------------------- Fabian Rost (Dipl.-Phys.) Wissenschaftlicher Mitarbeiter Technische Universität Dresden Centre for Information Services and High Performance Computing (ZIH) Dept. for Innovative Methods of Computing (IMC) 01062 Dresden Tel.: +49 (351) 463-38780 Fax: +49 (351) 463-38245 E-Mail: fabian.rost@tu-dresden.de<mailto:fabian.rost@tu-dresden.de> Web: http://imc.zih.tu-dresden.de/ -------------------------------- _______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org<mailto:SciPy-Dev@scipy.org> https://mail.scipy.org/mailman/listinfo/scipy-dev
On Mon, Jul 18, 2016 at 2:03 PM Fabian Rost <fabian.rost@tu-dresden.de> wrote:
Actually, a wrapper that starts many optimizations at random points would not only work for global optimizers but also for local optimizers. However, this wrapper would be limited to optimizers that use an initial guess (optimize.brute and differential_evolution would be excluded). Furthermore, this wrapper would turn any local optimizer into a global optimizer. In fact, the code I suggested here <http://nbviewer.jupyter.org/github/fabianrost84/multistart-minimize/blob/mas...> is just a wrapper around optimize.minimize.
I realized that the documentation of optimize.basinhopping suggests that "the algorithm can be run from a number of different random starting points". So a convenient wrapper to do this definitely would not hurt.
I wonder what this wrapper should return. It could return just the global optimum. But for analysis it might also be good to get all the OptimizeResults. This could help to judge how many local optima there are and if more or less initial guesses are needed to find the global optimum.
Parallelization is certainly a good idea.
What to return is a question the other global optimizers have already had to face. The basinhopping code just returns the global optimum, I believe, but also provides a callback that allows users to see each local minimum. Presumably they can also stop the global optimization if they're happy, by raising an exception if nothing else will work. This may require some thought in the context of parallelization, since it's not so easy to send results back from subprocesses. It might be necessary to "open the hood" of basinhopping and run just the local optimizations in subprocesses. Obviously not everyone is going to want parallelization, but global optimization tends to be a "big" problem, and making it easy for people to use parallelization, to the degree and in the way that they prefer, would make the code in scipy much more useful. For really complex and difficult optimizations, of course, people are going to want to switch to more elaborate specialized codes. But it we make that less necessary, so much the better. Anne
participants (2)
-
Anne Archibald -
Fabian Rost