Re: [SciPy-Dev] GSoC'21 participation SciPy
Hi, Thank you for putting this together! I would have some ideas for the ideal pool :) scipy.optimize: Would it be wanted to have a possibility to have workers to evaluate the function during an optimization? In most industrial context, the function is not trivial and might require minutes if not hours or even days to compute. Having a simple way to first parallelise the runs would help. We have machines with easily ten cores now and it would be great to leverage this here. Going that direction, having a more general infrastructure to handle external workers would be great. Sure there are external packages to do this, but then it’s not so trivial if you want to use SciPy’s optimizers. scipy.optimize: What about another optimization method such as EGO? This would require to have a Gaussian Process regressor. scipy.stats: there is an ANOVA section in the roadmap. But is sensitivity analysis in general something which would be of interest. I am thinking about Sobol’ indices (not related to Sobol’ sequence but from the same author), moment based indices, Shapley values, cusunoro, etc. scipy.metamodel: last but not least, a metamodel/response surface module. This is linked to the optimization or sensitivity analysis of expensive functions. Would be sufficient to have Gaussian Process and polynomial chaos expansion. Could also include more general things like linear regression or others things in scipy.interpolate. Cheers, Pamphile @tupui
On Mon, Feb 15, 2021 at 6:24 PM Pamphile Roy <roy.pamphile@gmail.com> wrote:
Hi,
Thank you for putting this together!
I would have some ideas for the ideal pool :)
Thanks Pamphile!
*scipy.optimize:* Would it be wanted to have a possibility to have workers to evaluate the function during an optimization? In most industrial context, the function is not trivial and might require minutes if not hours or even days to compute. Having a simple way to first parallelise the runs would help. We have machines with easily ten cores now and it would be great to leverage this here.
Definitely - see the mention of workers under http://scipy.github.io/devdocs/roadmap.html#performance-improvements. Going that direction, having a more general infrastructure to handle
external workers would be great.
I'm assuming you mean something like standard multiprocessing, or using a custom Pool object, for code that's trivially parallelizable. Both are covered by the `workers` pattern. If you're thinking about something else, can you elaborate? Sure there are external packages to do this, but then it’s not so trivial
if you want to use SciPy’s optimizers.
*scipy.optimize:* What about another optimization method such as EGO? This would require to have a Gaussian Process regressor.
In general we'd like to continue adding high-quality optimization methods if they bring something extra - see https://mail.python.org/pipermail/scipy-dev/2021-January/024489.html. Not sure about EGO in particular (I'm not familiar with it), gaussian processes sounds a little out of scope - that's scikit-learn territory probably.
*scipy.stats:* there is an ANOVA section in the roadmap. But is sensitivity analysis in general something which would be of interest. I am thinking about Sobol’ indices (not related to Sobol’ sequence but from the same author), moment based indices, Shapley values, cusunoro, etc.
I'm not 100% sure, let's see if someone more familiar with this topic has an opinion. In general for new stats functionality we try to figure out if it fits better in scipy.stats or in statsmodels. The latter doesn't have much either right now, only: https://www.statsmodels.org/stable/generated/statsmodels.genmod.generalized_...
*scipy.metamodel: *last but not least, a metamodel/response surface module. This is linked to the optimization or sensitivity analysis of expensive functions. Would be sufficient to have Gaussian Process and polynomial chaos expansion. Could also include more general things like linear regression or others things in scipy.interpolate.
That is out of scope I'd say, too specific for a new submodule - at the very least it should start as a separate package first. Cheers, Ralf
On Mon, Feb 15, 2021 at 2:02 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Mon, Feb 15, 2021 at 6:24 PM Pamphile Roy <roy.pamphile@gmail.com> wrote:
*scipy.optimize:* Would it be wanted to have a possibility to have workers to evaluate the function during an optimization? In most industrial context, the function is not trivial and might require minutes if not hours or even days to compute. Having a simple way to first parallelise the runs would help. We have machines with easily ten cores now and it would be great to leverage this here.
Definitely - see the mention of workers under http://scipy.github.io/devdocs/roadmap.html#performance-improvements.
Going that direction, having a more general infrastructure to handle
external workers would be great.
I'm assuming you mean something like standard multiprocessing, or using a custom Pool object, for code that's trivially parallelizable. Both are covered by the `workers` pattern. If you're thinking about something else, can you elaborate?
A standard approach for this is to organize the implementation of the optimization algorithms in what's usually called an "ask-tell" interface. The minimize()-style interface is easy to implement from an ask-tell interface, but not vice-versa. Basically, you have the optimizer object expose two methods, ask(), which returns a next point to evaluate, and tell(), where you feed back the point and its evaluated function value. You're in charge of evaluating that function. This gives you a lot of flexibility in how to dispatch that function evaluation, and importantly, we don't have to commit to any dependencies! That's the user's job! scikit-optimize implements their optimizers in this style, for example. It's pretty common for optimizers that are geared towards expensive evaluations. https://scikit-optimize.github.io/stable/auto_examples/ask-and-tell.html I think it might be a well-scoped GSoC project to start re-implementing a chosen handful of the algorithms in scipy.optimize in such an interface. It could even be a trial run as an external package (even in scikit-optimize, if they're amenable). Then we can evaluate whether we want to adopt that framework inside scipy.optimize and make a roadmap for re-implementing all of the algorithms in that style. It will be a technical challenge to adapt the FORTRAN-implemented algorithms to such an interface. I will not be available to mentor such a project, but that's the general approach that I would recommend. I think it would be a valuable addition. -- Robert Kern
On Mon, Feb 15, 2021 at 10:19 PM Robert Kern <robert.kern@gmail.com> wrote:
On Mon, Feb 15, 2021 at 2:02 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Mon, Feb 15, 2021 at 6:24 PM Pamphile Roy <roy.pamphile@gmail.com> wrote:
*scipy.optimize:* Would it be wanted to have a possibility to have workers to evaluate the function during an optimization? In most industrial context, the function is not trivial and might require minutes if not hours or even days to compute. Having a simple way to first parallelise the runs would help. We have machines with easily ten cores now and it would be great to leverage this here.
Definitely - see the mention of workers under http://scipy.github.io/devdocs/roadmap.html#performance-improvements.
Going that direction, having a more general infrastructure to handle
external workers would be great.
I'm assuming you mean something like standard multiprocessing, or using a custom Pool object, for code that's trivially parallelizable. Both are covered by the `workers` pattern. If you're thinking about something else, can you elaborate?
A standard approach for this is to organize the implementation of the optimization algorithms in what's usually called an "ask-tell" interface. The minimize()-style interface is easy to implement from an ask-tell interface, but not vice-versa. Basically, you have the optimizer object expose two methods, ask(), which returns a next point to evaluate, and tell(), where you feed back the point and its evaluated function value. You're in charge of evaluating that function. This gives you a lot of flexibility in how to dispatch that function evaluation, and importantly, we don't have to commit to any dependencies! That's the user's job!
scikit-optimize implements their optimizers in this style, for example. It's pretty common for optimizers that are geared towards expensive evaluations.
https://scikit-optimize.github.io/stable/auto_examples/ask-and-tell.html
I think it might be a well-scoped GSoC project to start re-implementing a chosen handful of the algorithms in scipy.optimize in such an interface. It could even be a trial run as an external package (even in scikit-optimize, if they're amenable). Then we can evaluate whether we want to adopt that framework inside scipy.optimize and make a roadmap for re-implementing all of the algorithms in that style. It will be a technical challenge to adapt the FORTRAN-implemented algorithms to such an interface.
I will not be available to mentor such a project, but that's the general approach that I would recommend. I think it would be a valuable addition.
Thanks Robert, that seems like an interesting exercise. This reminds me of the "class based optimizers" proposal. That didn't mention ask-tell, but the "reverse-communication" may be the same idea: https://github.com/scipy/scipy/pull/8552 https://mail.python.org/pipermail/scipy-dev/2018-February/022449.html Your comments and the scikit-optimize link are imho a better justification for doing this exercise than we had before. Cheers, Ralf
On Tue, Feb 16, 2021 at 9:03 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Mon, Feb 15, 2021 at 10:19 PM Robert Kern <robert.kern@gmail.com> wrote:
On Mon, Feb 15, 2021 at 2:02 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Mon, Feb 15, 2021 at 6:24 PM Pamphile Roy <roy.pamphile@gmail.com> wrote:
*scipy.optimize:* Would it be wanted to have a possibility to have workers to evaluate the function during an optimization? In most industrial context, the function is not trivial and might require minutes if not hours or even days to compute. Having a simple way to first parallelise the runs would help. We have machines with easily ten cores now and it would be great to leverage this here.
Definitely - see the mention of workers under http://scipy.github.io/devdocs/roadmap.html#performance-improvements.
Going that direction, having a more general infrastructure to handle
external workers would be great.
I'm assuming you mean something like standard multiprocessing, or using a custom Pool object, for code that's trivially parallelizable. Both are covered by the `workers` pattern. If you're thinking about something else, can you elaborate?
A standard approach for this is to organize the implementation of the optimization algorithms in what's usually called an "ask-tell" interface. The minimize()-style interface is easy to implement from an ask-tell interface, but not vice-versa. Basically, you have the optimizer object expose two methods, ask(), which returns a next point to evaluate, and tell(), where you feed back the point and its evaluated function value. You're in charge of evaluating that function. This gives you a lot of flexibility in how to dispatch that function evaluation, and importantly, we don't have to commit to any dependencies! That's the user's job!
scikit-optimize implements their optimizers in this style, for example. It's pretty common for optimizers that are geared towards expensive evaluations.
https://scikit-optimize.github.io/stable/auto_examples/ask-and-tell.html
I think it might be a well-scoped GSoC project to start re-implementing a chosen handful of the algorithms in scipy.optimize in such an interface. It could even be a trial run as an external package (even in scikit-optimize, if they're amenable). Then we can evaluate whether we want to adopt that framework inside scipy.optimize and make a roadmap for re-implementing all of the algorithms in that style. It will be a technical challenge to adapt the FORTRAN-implemented algorithms to such an interface.
I will not be available to mentor such a project, but that's the general approach that I would recommend. I think it would be a valuable addition.
Thanks Robert, that seems like an interesting exercise. This reminds me of the "class based optimizers" proposal. That didn't mention ask-tell, but the "reverse-communication" may be the same idea: https://github.com/scipy/scipy/pull/8552 https://mail.python.org/pipermail/scipy-dev/2018-February/022449.html
Your comments and the scikit-optimize link are imho a better justification for doing this exercise than we had before.
Yes, "reverse communication" is a FORTRAN-era term for the same general idea. In FORTRAN reverse communication APIs, you would generally call the one optimizer subroutine over and over again, passing in the current state and function evaluation and reading the next point to evaluate (and other state) from "intent out" variables. "ask-tell" is a somewhat more specific instance of that idea, just in an OO context for which it's just a fairly obvious design pattern once you have chosen to go OO and free yourself from the constraints of a FORTRAN subroutine. -- Robert Kern
It's good to hear about the ask-tell interface, it's not something I'd heard about before. The class-based Optimizer that was proposed wasn't going to work in quite that way. The main concept was to create an (e.g.) LBFGSB class (inheriting a Minimizer superclass). All Minimizer objects would be iterators, having a __next__ method that would perform one step of a minimisation loop. Iterator based design syncs quite well with the loop based design of most of the existing minimisation algorithms. The __next__ method would be responsible for calling the user based functions. If the user based functions could be marked as vectorisable the __next__ method could despatch a whole series of `x` locations for the user function (one or all of func/jac/hess) to evaluate; the user function could do whatever parallelisation it wanted. Vectorisable function evaluations also offer benefits for numerical differentiation evaluation. The return value of __next__ would be something along the lines of an intermediate OptimizeResult. I don't know the ask-tell approach works in finer detail. For example, each minimisation step typically requires multiple function evaluations to proceed, e.g. at least once for func evaluation, and many times more for grad/jac and hess evaluation (not to mention constraint function evaluations). THerefore there wouldn't be a 1:1 correspondence of a single ask-tell and a complete step of the minimizer. I reckon the development of this would be way more than a single GSOC could provide, at least to get a mature design into scipy. It's vital to get the architecture correct (esp. the base class), when considering all the minimizers that scipy offers, and their different vagaries. Implementing for one or two minimizers wouldn't be sufficient otherwise one forgets that they e.g. all have different approaches to halting, and you find yourself bolting other things on to make things work. In addition, it's not just the minimizer methods that are involved, it's considering how this all ties in with how constraints/numerical differentiation/`LowLevelCallable`/etc could be improved/used in such a design. At least for the methods involved in `minimize` such an opportunity is the time to consider a total redesign of how things work. Smart/vectorisable numerical differentiation would be more than a whole GSOC in itself. As Robert says, implementation in a separate package would probably be the best way to work; once the bugs have been ironed out it could be merged into scipy-proper. Any redesign could take into account the existing API's/functionality to make things a less jarring change. It'd be great to get the original class-based Optimization off the ground, or something similar. However, it's worth noting that the original proposal only received lukewarm support. A.
Hi! I've added an idea about implementing object-oriented design of filtering in scipy.signal. It was discussed quite a lot in the past, I think it's a sane idea and scipy.signal definitely can be made more user friendly and convenient. This is only some preliminarily view on the project. Feel free to edit the text. So far I've put only myself as a possible mentor. Nikolay ---- On Thu, 18 Feb 2021 08:34:16 +0500 Andrew Nelson <andyfaff@gmail.com> wrote ---- It's good to hear about the ask-tell interface, it's not something I'd heard about before. The class-based Optimizer that was proposed wasn't going to work in quite that way. The main concept was to create an (e.g.) LBFGSB class (inheriting a Minimizer superclass). All Minimizer objects would be iterators, having a __next__ method that would perform one step of a minimisation loop. Iterator based design syncs quite well with the loop based design of most of the existing minimisation algorithms. The __next__ method would be responsible for calling the user based functions. If the user based functions could be marked as vectorisable the __next__ method could despatch a whole series of `x` locations for the user function (one or all of func/jac/hess) to evaluate; the user function could do whatever parallelisation it wanted. Vectorisable function evaluations also offer benefits for numerical differentiation evaluation. The return value of __next__ would be something along the lines of an intermediate OptimizeResult. I don't know the ask-tell approach works in finer detail. For example, each minimisation step typically requires multiple function evaluations to proceed, e.g. at least once for func evaluation, and many times more for grad/jac and hess evaluation (not to mention constraint function evaluations). THerefore there wouldn't be a 1:1 correspondence of a single ask-tell and a complete step of the minimizer. I reckon the development of this would be way more than a single GSOC could provide, at least to get a mature design into scipy. It's vital to get the architecture correct (esp. the base class), when considering all the minimizers that scipy offers, and their different vagaries. Implementing for one or two minimizers wouldn't be sufficient otherwise one forgets that they e.g. all have different approaches to halting, and you find yourself bolting other things on to make things work. In addition, it's not just the minimizer methods that are involved, it's considering how this all ties in with how constraints/numerical differentiation/`LowLevelCallable`/etc could be improved/used in such a design. At least for the methods involved in `minimize` such an opportunity is the time to consider a total redesign of how things work. Smart/vectorisable numerical differentiation would be more than a whole GSOC in itself. As Robert says, implementation in a separate package would probably be the best way to work; once the bugs have been ironed out it could be merged into scipy-proper. Any redesign could take into account the existing API's/functionality to make things a less jarring change. It'd be great to get the original class-based Optimization off the ground, or something similar. However, it's worth noting that the original proposal only received lukewarm support. A. _______________________________________________ SciPy-Dev mailing list mailto:SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
Hi Nikolay, Thanks for adding that! I agree there's a lot to improve there. It's a nice and detailed description, I don't have much to add. Gmail unhelpfully classified your email as spam, so I thought I'd reply on the list in case that happened for other people as well. Cheers, Ralf On Fri, Mar 12, 2021 at 8:08 PM Nikolay Mayorov <nikolay.mayorov@zoho.com> wrote:
Hi!
I've added an idea about implementing object-oriented design of filtering in scipy.signal. It was discussed quite a lot in the past, I think it's a sane idea and scipy.signal definitely can be made more user friendly and convenient.
This is only some preliminarily view on the project. Feel free to edit the text. So far I've put only myself as a possible mentor.
Nikolay
Ralf, thanks for the feedback! Waiting for a good student to apply for this project :) Nikolay ---- On Sat, 20 Mar 2021 20:18:06 +0500 Ralf Gommers <ralf.gommers@gmail.com> wrote ---- Hi Nikolay, Thanks for adding that! I agree there's a lot to improve there. It's a nice and detailed description, I don't have much to add. Gmail unhelpfully classified your email as spam, so I thought I'd reply on the list in case that happened for other people as well. Cheers, Ralf On Fri, Mar 12, 2021 at 8:08 PM Nikolay Mayorov <mailto:nikolay.mayorov@zoho.com> wrote: Hi! I've added an idea about implementing object-oriented design of filtering in scipy.signal. It was discussed quite a lot in the past, I think it's a sane idea and scipy.signal definitely can be made more user friendly and convenient. This is only some preliminarily view on the project. Feel free to edit the text. So far I've put only myself as a possible mentor. Nikolay _______________________________________________ SciPy-Dev mailing list mailto:SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
participants (5)
-
Andrew Nelson
-
Nikolay Mayorov
-
Pamphile Roy
-
Ralf Gommers
-
Robert Kern