Re: [SciPy-Dev] Sensitivity analysis module proposal (Robert Kern)

Hi all, First things first, there are too many things to discuss and email is not appropriate. I propose to make a call with people interested next week or the week after. Let me know if we can do this and what time would work so I can put together a doodle. Some replies on specific points: Let me preface this by saying that I am not gaining anything from working on SciPy. I have no project related to SA at the moment and nothing is planned. I am only giving some personal time and have zero financial support for my time. SALib into SciPy: I proposed this following your input Robert. You said that if the maintainers of this library would be willing to do so, then we could discuss. I am very happy to see that they would actually be willing to do this and help. Funding: I can only disagree. Yes, being part of SciPy does not open directly a line of credit, but it does indirectly help. Art Owen already told me that he is using the fact that he helped us on stats.qmc to get some fundings. And I am certain some of us also use this with their current employers. Bus factor: We are way pass the bus factor. There is the team behind SALib, me, and I already got interest from people like Sergei Kucherenko (one of the pillar of SA). And if we go that way, I am 100% sure Saltelli (another pillar) will be interested (I worked with him on a tooling project). People like them are willing to contribute only because it’s SciPy. We had this discussion at MCM2021 when I presented SciPy’s QMC module to the QMC community. Researchers, at least from the communities I know and talked with, do not want to contribute to something that is uncertain or linked to a particular group. With SciPy they see an opportunity to have a large impact across a wide range of fields. And also it’s the assurance to have a long term impact. Why SA? How is this important? SA is becoming a major field. Not only by itself, but other fields are starting to use its methods. Things like Shapley values in AI, importance factor, etc. In most engineering field, it’s now mandatory to assess uncertainties. Hence there is a real need for tooling. There are mature libraries in the field. OpenTURNS, UQLab and Dakota are the most used among practitioners. And they are all not independent and open as we are (explains partially why people like Sergei or Saltelli are not contributors). I will not go here about why I think these library should not be used. I really don’t get the push back on this one. It’s about adding a few method in stats at most and the benefits would be arguably huge compared to some functions/sub modules we added recently. Behind it, there would be the most renown people of the field and that could give another great exposure for SciPy. We will talk of SciPy at new conferences, etc. My presentation of QMC had a great impact in the QMC community. I had 4 concrete proposal of collaboration just during the conference. We went from not being in one field to be the recommended tool by a community. SciPy is at the foundation of the scientific ecosystem in Python and not having basic tooling a about uncertainty/SA that other higher lever packages could rely on is a puzzle for me. As I said at the beginning, let’s have a talk to decide what’s in the best interest of SciPy. Cheers, Pamphile
On 3 Sep 2021, at 03:35, Robert Kern <robert.kern@gmail.com> wrote:
On Thu, Sep 2, 2021 at 3:30 PM William Usher <wusher@kth.se> wrote:
Hi Robert,
Thanks for the response. You raise good points.
Obviously, that you are interested in the proposal assuages some of that, but I'm still unclear on why you are interested in this. What is the benefit that you think everyone will get by absorbing SALib into scipy? It still looks to me like a mostly-lateral move that will merely be disruptive to your dependent projects more than anything else.
The real value of SALib is in providing a consistent interface to a (large) suite of sensitivity analysis methods which allows users to easily switch between those methods.
We (as maintainers) could benefit from reducing duplication of code and implementations, such has Sobol’ sequence generation, LHS, and could contribute some of the more general sample generation implementations where appropriate (many are linked directly to the SA implementations and not useful outside of that).
I definitely think anything that could plausibly fall in the purview of scipy.stats.qmc would be a good target for convergence. If you can't use scipy.stats.qmc due to missing functionality (and not just that you don't want to require that recent version of scipy), then let's see how we can shore it up. I think that where your sampling methods overlap with design of experiments in general might also be a fruitful place for scipy.stats to grow/absorb some community-wide functionality.
We think the scientific community as a whole would benefit from the greater exposure a SciPy implementation of SA would bring - as a large community-led effort - it could provide a neutral forum for further development of these methods. This would likely come at some “cost” to the successful “cottage industry” we’ve established and grown (SALib is getting lots of citations and use).
That is gold. I am having trouble understanding why you would contemplate anything that might place that at risk. If I were dictator here, I would reject you out of hand for your own good. ;-)
I think a key argument against integration is that it may reduce the agility with which we can add new methods to our SA suite (although this could be mitigated with careful design).
I think you raise an important point about our dependent projects, and particularly how we would continue to support legacy releases if developer resources were focussed on a SciPy integration?
An attraction is the possibility of funding to support the development of SA within SciPy. Like all open-source projects, we suffer from resourcing issues, and are predominantly volunteer-driven from the academic community. And while we are technically a “multi-developer” community, we’re only a few bus accidents (or career changes) away from being a lone-maintainer.
That's more required bus accidents than many parts of scipy. :-)
I'm afraid that contributing into scipy doesn't unlock a pot of funds. I'm out of the grant-writing game, but I suspect that the difference between applying for funds on your own and arguing for allocation from competing needs inside of scipy is mostly a push. Have you considered applying to NumFocus? I occasionally look at their accepted and rejected projects, and I'd lump you in with the former, IMO. You're doing important work, playing well with other projects in the ecosystem, and have at least the seed kernel of sustainable community development so that funds are likely to actually sustain development.
-- Robert Kern _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev

[private] On Fri, Sep 3, 2021 at 1:16 PM Pamphile Roy <roy.pamphile@gmail.com> wrote:
Hi all,
First things first, there are too many things to discuss and email is not appropriate. I propose to make a call with people interested next week or the week after. Let me know if we can do this and what time would work so I can put together a doodle.
Some replies on specific points:
This is not the first time that you have declared a forum inappropriate for discussion as a prelude to dropping a lengthy final argument in that same forum. That's not an acceptable way to treat your colleagues. -- Robert Kern

On Fri, Sep 3, 2021 at 7:32 PM Robert Kern <robert.kern@gmail.com> wrote:
[private]
I will reply to the rest of the thread later today. It looks like this message went to the scipy-dev list instead of in private by accident. That happens. I think I understand the tensions and disconnect in this conversation. I have planned a call with Pamphile to talk about how to structure the conversation differently next time, and also about how we are used to discussing and making decisions about SciPy development and community topics here. I think we all have the best of intentions in moving the project forward, so I suggest leaving the public meta conversation here. Cheers, Ralf
On Fri, Sep 3, 2021 at 1:16 PM Pamphile Roy <roy.pamphile@gmail.com> wrote:
Hi all,
First things first, there are too many things to discuss and email is not appropriate. I propose to make a call with people interested next week or the week after. Let me know if we can do this and what time would work so I can put together a doodle.
Some replies on specific points:
This is not the first time that you have declared a forum inappropriate for discussion as a prelude to dropping a lengthy final argument in that same forum. That's not an acceptable way to treat your colleagues.
-- Robert Kern _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev

On Tue, Sep 14, 2021 at 10:46 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Fri, Sep 3, 2021 at 7:32 PM Robert Kern <robert.kern@gmail.com> wrote:
[private]
I will reply to the rest of the thread later today.
It looks like this message went to the scipy-dev list instead of in private by accident.
Everyone, but especially Pamphile, has my deepest apologies. -- Robert Kern

On Fri, Sep 3, 2021 at 7:15 PM Pamphile Roy <roy.pamphile@gmail.com> wrote:
Hi all,
First things first, there are too many things to discuss and email is not appropriate. I propose to make a call with people interested next week or the week after. Let me know if we can do this and what time would work so I can put together a doodle.
I think I understand the point you were trying to make - this is a large topic, and email is not the easiest conversation mechanism. So having a call with people who are interested to hash out a few things can be very useful. However let me point out that the purpose of that would be to get to a clearer proposal - final discussions and decisions on new features and directions for SciPy development has always been and will continue to be on this list. We will need a better scoped proposal, like a list of functionality that should go into (e.g.) `scipy.stats`.
Some replies on specific points:
Let me preface this by saying that I am not gaining anything from working on SciPy. I have no project related to SA at the moment and nothing is planned. I am only giving some personal time and have zero financial support for my time.
This is true for almost everyone at the moment (good news is that we do have a new grant from CZI starting soon, for ~0.5 FTE).
- SALib into SciPy:
I proposed this following your input Robert. You said that if the maintainers of this library would be willing to do so, then we could discuss. I am very happy to see that they would actually be willing to do this and help.
- Funding:
I can only disagree. Yes, being part of SciPy does not open directly a line of credit, but it does indirectly help. Art Owen already told me that he is using the fact that he helped us on stats.qmc to get some fundings. And I am certain some of us also use this with their current employers.
I think both points of view are valid here. For dedicated direct funding, like a grant for developing new SA methods, a separate project is probably better. For "secondary benefits", being able to say that something goes into SciPy - and therefore reaches potentially O(10 million) users - can be powerful.
- Bus factor:
We are way pass the bus factor. There is the team behind SALib, me, and I already got interest from people like Sergei Kucherenko (one of the pillar of SA). And if we go that way, I am 100% sure Saltelli (another pillar) will be interested (I worked with him on a tooling project).
It's pure Python code and relatively straightforward technically, so if there's a commitment from more than one person to be maintainers, I think this part doesn't worry me.
People like them are willing to contribute only because it’s SciPy. We had this discussion at MCM2021 when I presented SciPy’s QMC module to the QMC community. Researchers, at least from the communities I know and talked with, do not want to contribute to something that is uncertain or linked to a particular group. With SciPy they see an opportunity to have a large impact across a wide range of fields. And also it’s the assurance to have a long term impact.
This is not the primary factor for deciding whether or not something belongs in SciPy, but it's helpful to know that there will be expert support/reviewers. When adding new functionality, ensuring correctness and fit for purpose of algorithms is quite often a pain point when deciding whether to merge a PR.
- Why SA? How is this important?
SA is becoming a major field. Not only by itself, but other fields are starting to use its methods. Things like Shapley values in AI, importance factor, etc. In most engineering field, it’s now mandatory to assess uncertainties. Hence there is a real need for tooling.
I think this is where more detail and data is needed. For reference, these are the sub-submodules we added in the past 10 years: `stats.qmc`, `spatial.transform`, `signal.windows`, Cython interfaces for `linalg` and `special`, `sparse.csgraph`. We haven't added any new submodules in over a decade. All of those seem more "general" than SA, except perhaps `stats.qmc` which has a similar audience. So I'd say that a new top-level submodule does not sound like a good fit. A sub-submodule or just a set of functions/classes in `scipy.stats` could make sense on the other hand. The question is what that list should look like to make this effort make sense for both SciPy and SALib. There are mature libraries in the field. OpenTURNS, UQLab and Dakota are
the most used among practitioners. And they are all not independent and open as we are (explains partially why people like Sergei or Saltelli are not contributors). I will not go here about why I think these library should not be used.
I really don’t get the push back on this one. It’s about adding a few method in stats at most and the benefits would be arguably huge compared to some functions/sub modules we added recently.
That sounds like what I said before - so let's make that list, to make the conversation concrete. Behind it, there would be the most renown people of the field and that
could give another great exposure for SciPy. We will talk of SciPy at new conferences, etc.
My presentation of QMC had a great impact in the QMC community. I had 4 concrete proposal of collaboration just during the conference. We went from not being in one field to be the recommended tool by a community.
SciPy is at the foundation of the scientific ecosystem in Python and not having basic tooling a about uncertainty/SA that other higher lever packages could rely on is a puzzle for me.
That is usually a good reason to put something in SciPy: if other packages with a significant user base need/want it, and those do not want to rely on a smaller package like SALib as a dependency. This is how we got sparse graph algorithms for example - upstreamed from scikit-learn and then expanded. Cheers, Ralf
As I said at the beginning, let’s have a talk to decide what’s in the best interest of SciPy.
Cheers,
Pamphile
On 3 Sep 2021, at 03:35, Robert Kern <robert.kern@gmail.com> wrote:
On Thu, Sep 2, 2021 at 3:30 PM William Usher <wusher@kth.se> wrote:
Hi Robert,
Thanks for the response. You raise good points.
Obviously, that you are interested in the proposal assuages some of that, but I'm still unclear on why you are interested in this. What is the benefit that you think everyone will get by absorbing SALib into scipy? It still looks to me like a mostly-lateral move that will merely be disruptive to your dependent projects more than anything else.
The real value of SALib is in providing a consistent interface to a (large) suite of sensitivity analysis methods which allows users to easily switch between those methods.
We (as maintainers) could benefit from reducing duplication of code and implementations, such has Sobol’ sequence generation, LHS, and could contribute some of the more general sample generation implementations where appropriate (many are linked directly to the SA implementations and not useful outside of that).
I definitely think anything that could plausibly fall in the purview of scipy.stats.qmc would be a good target for convergence. If you can't use scipy.stats.qmc due to missing functionality (and not just that you don't want to require that recent version of scipy), then let's see how we can shore it up. I think that where your sampling methods overlap with design of experiments in general might also be a fruitful place for scipy.stats to grow/absorb some community-wide functionality.
We think the scientific community as a whole would benefit from the
greater exposure a SciPy implementation of SA would bring - as a large community-led effort - it could provide a neutral forum for further development of these methods. This would likely come at some “cost” to the successful “cottage industry” we’ve established and grown (SALib is getting lots of citations and use).
That is gold. I am having trouble understanding why you would contemplate anything that might place that at risk. If I were dictator here, I would reject you out of hand for your own good. ;-)
I think a key argument against integration is that it may reduce the
agility with which we can add new methods to our SA suite (although this could be mitigated with careful design).
I think you raise an important point about our dependent projects, and particularly how we would continue to support legacy releases if developer resources were focussed on a SciPy integration?
An attraction is the possibility of funding to support the development of SA within SciPy. Like all open-source projects, we suffer from resourcing issues, and are predominantly volunteer-driven from the academic community. And while we are technically a “multi-developer” community, we’re only a few bus accidents (or career changes) away from being a lone-maintainer.
That's more required bus accidents than many parts of scipy. :-)
I'm afraid that contributing into scipy doesn't unlock a pot of funds. I'm out of the grant-writing game, but I suspect that the difference between applying for funds on your own and arguing for allocation from competing needs inside of scipy is mostly a push. Have you considered applying to NumFocus? I occasionally look at their accepted and rejected projects, and I'd lump you in with the former, IMO. You're doing important work, playing well with other projects in the ecosystem, and have at least the seed kernel of sustainable community development so that funds are likely to actually sustain development.
-- Robert Kern _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev

Hi everyone, As Ralf mentioned earlier, we had a chat about the way I framed my proposal and how I communicated my intent. Mainly, I should have been more precise from the start to avoid some misunderstanding especially on the size/scope of the idea. Hence, I would like to continue the discussion with a concrete proposal. Proposal Add a single Sensitivity Analysis function scipy.stats.sobol_indices. The function would compute Sobol’ indices [1,2]. Consider a function f with parameters x1, x2 and x3. Hence y=f(x1,x2,x3). We are interested to know which parameter has the most impact, in terms of variance, on the value y. Let’s take an example with the Ishigami function: y = sin(x1) + 7*sin(x2)**2 + 0.1*x3**4*sin(x1) It’s not obvious to tell which variable would impact more y. The Sobol’ indices are bounded from 0 to 1, with 1 meaning more important. Here they would be: Variable First order Sobol’ Total Sobol’ x1 0.31 0.56 x2 0.44 0.44 x3 0.0 0.24 The difference between the first and total indices indicate an interaction between variables. The total indices allow to rank the variable by importance. x1 is the most important. Looking at the first orders, x3 by itself does not have an impact on the variance of the output. It’s its combination with another variable which makes it have a total impact of 0.24. x1 in this case also have a difference in first and total indices while x2 is the same. We can say that x1 and x2 have a second order interaction. Implementation The signature could look like: def sobol_indices(f: Callable, bounds: np.ndarray, n: int) -> np.ndarray: The underlying code is fairly simple and here is an example implementation I did a while back: https://gist.github.com/tupui/09f065d6afc923d4c2f5d6d430e11696 <https://gist.github.com/tupui/09f065d6afc923d4c2f5d6d430e11696> Existing packages: ex. SALib Our implementation would be jointly written with SALib maintainers and experts from SA to ensure it helps the existing and new communities while allowing easier maintenance and evolutivity. SALib has an implementation of Sobol' indices here https://github.com/SALib/SALib/blob/main/src/SALib/analyze/sobol.py <https://github.com/SALib/SALib/blob/main/src/SALib/analyze/sobol.py>. I believe the feature set is complete, but we might be able to generalize the implementation. For instance, a method parameter could be use to use different formulas (FAST, Saltelli, Saltenis, etc.), n_processors and parallel could be linked, conf_level and num_resamples could also be rethought as some formulas include confidence asymptotic confidence intervals, problem could be simplified to accept a callable/array of function evaluation. These are just examples. In the end libraries like SALib could take advantage of our implementation to build extra features on. Included but not limited to plotting, sampling, run handling, further analysis (such as UQ, optimization), etc. Background Sobol’ indices are the cornerstone of SA and UQ and its application is not restricted to any field. It’s a non intrusive method which makes the only assumption that the variables are independent (this constraint can be alleviated). Being able to compute sensitivity indices allows to reduce the dimensionality of a problem, better understand the importance of each factors and also see how parameters are interacting with each other. As such, it’s an important engineering tool. If you have 2 variables and only have the budget to improve your knowledge on one of them, this can help to make a choice. There are a lot of successful usage of SA in the literature and in real world applications. The EU (through the JRC), is now requiring to conduce uncertainty analysis when evaluating a system. They are recommending the use of Sobol’ indices. References Upon request, I can provide more information. Here are two famous references. Sobol,I.M. (2001), Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. MATH COMPUT SIMULAT,55(1–3),271-280, doi <https://en.m.wikipedia.org/wiki/Doi_(identifier)>:10.1016/S0378-4754(00)00270-6 <https://doi.org/10.1016%2FS0378-4754(00)00270-6> Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D. Saisana, M., and Tarantola, S., 2008, Global Sensitivity Analysis. The Primer, John Wiley & Sons. Cheers, Pamphile
On 14.09.2021, at 17:11, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Fri, Sep 3, 2021 at 7:15 PM Pamphile Roy <roy.pamphile@gmail.com <mailto:roy.pamphile@gmail.com>> wrote:
Hi all,
First things first, there are too many things to discuss and email is not appropriate. I propose to make a call with people interested next week or the week after. Let me know if we can do this and what time would work so I can put together a doodle.
I think I understand the point you were trying to make - this is a large topic, and email is not the easiest conversation mechanism. So having a call with people who are interested to hash out a few things can be very useful. However let me point out that the purpose of that would be to get to a clearer proposal - final discussions and decisions on new features and directions for SciPy development has always been and will continue to be on this list.
We will need a better scoped proposal, like a list of functionality that should go into (e.g.) `scipy.stats`.
Some replies on specific points:
Let me preface this by saying that I am not gaining anything from working on SciPy. I have no project related to SA at the moment and nothing is planned. I am only giving some personal time and have zero financial support for my time.
This is true for almost everyone at the moment (good news is that we do have a new grant from CZI starting soon, for ~0.5 FTE).
SALib into SciPy:
I proposed this following your input Robert. You said that if the maintainers of this library would be willing to do so, then we could discuss. I am very happy to see that they would actually be willing to do this and help.
Funding:
I can only disagree. Yes, being part of SciPy does not open directly a line of credit, but it does indirectly help. Art Owen already told me that he is using the fact that he helped us on stats.qmc to get some fundings. And I am certain some of us also use this with their current employers.
I think both points of view are valid here. For dedicated direct funding, like a grant for developing new SA methods, a separate project is probably better. For "secondary benefits", being able to say that something goes into SciPy - and therefore reaches potentially O(10 million) users - can be powerful.
Bus factor:
We are way pass the bus factor. There is the team behind SALib, me, and I already got interest from people like Sergei Kucherenko (one of the pillar of SA). And if we go that way, I am 100% sure Saltelli (another pillar) will be interested (I worked with him on a tooling project).
It's pure Python code and relatively straightforward technically, so if there's a commitment from more than one person to be maintainers, I think this part doesn't worry me.
People like them are willing to contribute only because it’s SciPy. We had this discussion at MCM2021 when I presented SciPy’s QMC module to the QMC community. Researchers, at least from the communities I know and talked with, do not want to contribute to something that is uncertain or linked to a particular group. With SciPy they see an opportunity to have a large impact across a wide range of fields. And also it’s the assurance to have a long term impact.
This is not the primary factor for deciding whether or not something belongs in SciPy, but it's helpful to know that there will be expert support/reviewers. When adding new functionality, ensuring correctness and fit for purpose of algorithms is quite often a pain point when deciding whether to merge a PR.
Why SA? How is this important? SA is becoming a major field. Not only by itself, but other fields are starting to use its methods. Things like Shapley values in AI, importance factor, etc. In most engineering field, it’s now mandatory to assess uncertainties. Hence there is a real need for tooling.
I think this is where more detail and data is needed. For reference, these are the sub-submodules we added in the past 10 years: `stats.qmc`, `spatial.transform`, `signal.windows`, Cython interfaces for `linalg` and `special`, `sparse.csgraph`. We haven't added any new submodules in over a decade. All of those seem more "general" than SA, except perhaps `stats.qmc` which has a similar audience.
So I'd say that a new top-level submodule does not sound like a good fit. A sub-submodule or just a set of functions/classes in `scipy.stats` could make sense on the other hand. The question is what that list should look like to make this effort make sense for both SciPy and SALib.
There are mature libraries in the field. OpenTURNS, UQLab and Dakota are the most used among practitioners. And they are all not independent and open as we are (explains partially why people like Sergei or Saltelli are not contributors). I will not go here about why I think these library should not be used.
I really don’t get the push back on this one. It’s about adding a few method in stats at most and the benefits would be arguably huge compared to some functions/sub modules we added recently.
That sounds like what I said before - so let's make that list, to make the conversation concrete.
Behind it, there would be the most renown people of the field and that could give another great exposure for SciPy. We will talk of SciPy at new conferences, etc.
My presentation of QMC had a great impact in the QMC community. I had 4 concrete proposal of collaboration just during the conference. We went from not being in one field to be the recommended tool by a community.
SciPy is at the foundation of the scientific ecosystem in Python and not having basic tooling a about uncertainty/SA that other higher lever packages could rely on is a puzzle for me.
That is usually a good reason to put something in SciPy: if other packages with a significant user base need/want it, and those do not want to rely on a smaller package like SALib as a dependency. This is how we got sparse graph algorithms for example - upstreamed from scikit-learn and then expanded.
Cheers, Ralf
As I said at the beginning, let’s have a talk to decide what’s in the best interest of SciPy.
Cheers, Pamphile
On 3 Sep 2021, at 03:35, Robert Kern <robert.kern@gmail.com <mailto:robert.kern@gmail.com>> wrote:
On Thu, Sep 2, 2021 at 3:30 PM William Usher <wusher@kth.se <mailto:wusher@kth.se>> wrote: Hi Robert,
Thanks for the response. You raise good points.
Obviously, that you are interested in the proposal assuages some of that, but I'm still unclear on why you are interested in this. What is the benefit that you think everyone will get by absorbing SALib into scipy? It still looks to me like a mostly-lateral move that will merely be disruptive to your dependent projects more than anything else.
The real value of SALib is in providing a consistent interface to a (large) suite of sensitivity analysis methods which allows users to easily switch between those methods.
We (as maintainers) could benefit from reducing duplication of code and implementations, such has Sobol’ sequence generation, LHS, and could contribute some of the more general sample generation implementations where appropriate (many are linked directly to the SA implementations and not useful outside of that).
I definitely think anything that could plausibly fall in the purview of scipy.stats.qmc would be a good target for convergence. If you can't use scipy.stats.qmc due to missing functionality (and not just that you don't want to require that recent version of scipy), then let's see how we can shore it up. I think that where your sampling methods overlap with design of experiments in general might also be a fruitful place for scipy.stats to grow/absorb some community-wide functionality.
We think the scientific community as a whole would benefit from the greater exposure a SciPy implementation of SA would bring - as a large community-led effort - it could provide a neutral forum for further development of these methods. This would likely come at some “cost” to the successful “cottage industry” we’ve established and grown (SALib is getting lots of citations and use).
That is gold. I am having trouble understanding why you would contemplate anything that might place that at risk. If I were dictator here, I would reject you out of hand for your own good. ;-)
I think a key argument against integration is that it may reduce the agility with which we can add new methods to our SA suite (although this could be mitigated with careful design).
I think you raise an important point about our dependent projects, and particularly how we would continue to support legacy releases if developer resources were focussed on a SciPy integration?
An attraction is the possibility of funding to support the development of SA within SciPy. Like all open-source projects, we suffer from resourcing issues, and are predominantly volunteer-driven from the academic community. And while we are technically a “multi-developer” community, we’re only a few bus accidents (or career changes) away from being a lone-maintainer.
That's more required bus accidents than many parts of scipy. :-)
I'm afraid that contributing into scipy doesn't unlock a pot of funds. I'm out of the grant-writing game, but I suspect that the difference between applying for funds on your own and arguing for allocation from competing needs inside of scipy is mostly a push. Have you considered applying to NumFocus? I occasionally look at their accepted and rejected projects, and I'd lump you in with the former, IMO. You're doing important work, playing well with other projects in the ecosystem, and have at least the seed kernel of sustainable community development so that funds are likely to actually sustain development.
-- Robert Kern _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org <mailto:SciPy-Dev@python.org> https://mail.python.org/mailman/listinfo/scipy-dev <https://mail.python.org/mailman/listinfo/scipy-dev>
SciPy-Dev mailing list SciPy-Dev@python.org <mailto:SciPy-Dev@python.org> https://mail.python.org/mailman/listinfo/scipy-dev <https://mail.python.org/mailman/listinfo/scipy-dev> _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org <mailto:SciPy-Dev@python.org> https://mail.python.org/mailman/listinfo/scipy-dev <https://mail.python.org/mailman/listinfo/scipy-dev>

On Mon, Sep 20, 2021 at 11:46 AM Pamphile Roy <roy.pamphile@gmail.com> wrote:
Hi everyone, As Ralf mentioned earlier, we had a chat about the way I framed my proposal and how I communicated my intent. Mainly, I should have been more precise from the start to avoid some misunderstanding especially on the size/scope of the idea. Hence, I would like to continue the discussion with a concrete proposal. *Proposal* Add a single Sensitivity Analysis function *scipy.stats.sobol_indices.* The function would compute Sobol’ indices [1,2]. Consider a function f with parameters x1, x2 and x3. Hence y=f(x1,x2,x3). We are interested to know which parameter has the most impact, in terms of variance, on the value y. Let’s take an example with the Ishigami function: y = sin(x1) + 7*sin(x2)**2 + 0.1*x3**4*sin(x1) It’s not obvious to tell which variable would impact more y. The Sobol’ indices are bounded from 0 to 1, with 1 meaning more important. Here they would be: Variable First order Sobol’ Total Sobol’ x1 0.31 0.56 x2 0.44 0.44 x3 0.0 0.24 The difference between the first and total indices indicate an interaction between variables. The total indices allow to rank the variable by importance. x1 is the most important. Looking at the first orders, x3 by itself does not have an impact on the variance of the output. It’s its combination with another variable which makes it have a total impact of 0.24. x1 in this case also have a difference in first and total indices while x2 is the same. We can say that x1 and x2 have a second order interaction. *Implementation* The signature could look like: *def sobol_indices(f: Callable, bounds: np.ndarray, n: int) -> np.ndarray:* The underlying code is fairly simple and here is an example implementation I did a while back: https://gist.github.com/tupui/09f065d6afc923d4c2f5d6d430e11696 *Existing packages: ex. SALib* Our implementation would be jointly written with SALib maintainers and experts from SA to ensure it helps the existing and new communities while allowing easier maintenance and evolutivity. SALib has an implementation of Sobol' indices here https://github.com/SALib/SALib/blob/main/src/SALib/analyze/sobol.py. I believe the feature set is complete, but we might be able to generalize the implementation. For instance, a *method* parameter could be use to use different formulas (FAST, Saltelli, Saltenis, etc.), *n_processors* and *parallel* could be linked, *conf_level* and *num_resamples* could also be rethought as some formulas include confidence asymptotic confidence intervals, *problem* could be simplified to accept a callable/array of function evaluation. These are just examples. In the end libraries like SALib could take advantage of our implementation to build extra features on. Included but not limited to plotting, sampling, run handling, further analysis (such as UQ, optimization), etc.
Thank you for the concrete proposal. I'm afraid that it still leaves me with the opinion that we should encourage people to use and contribute to the SALib implementation instead of adding this to scipy. In my estimation, the opportunities to improve upon the generalizability, reusability, and maintenance of these implementations are best exploited by contributing to the development of SALib. Other than the Sobol' sequence generation, which is already in scipy thanks to your work, the remaining reusable functionality (essentially, just these two lines of formulae <https://gist.github.com/tupui/09f065d6afc923d4c2f5d6d430e11696#file-sobol_sa...>) are not really any more reusable if they were inside of scipy. -- Robert Kern

Hi Robert and all, I again came across this proposal from Pamphile, and it looks like there has not yet been a firm resolution. On Mon, Sep 20, 2021, at 10:36, Robert Kern wrote:
Thank you for the concrete proposal. I'm afraid that it still leaves me with the opinion that we should encourage people to use and contribute to the SALib implementation instead of adding this to scipy. In my estimation, the opportunities to improve upon the generalizability, reusability, and maintenance of these implementations are best exploited by contributing to the development of SALib. Other than the Sobol' sequence generation, which is already in scipy thanks to your work, the remaining reusable functionality (essentially, just these two lines of formulae <https://gist.github.com/tupui/09f065d6afc923d4c2f5d6d430e11696#file-sobol_sa...>) are not really any more reusable if they were inside of scipy.
To the last point, part of SciPy's purpose is to implement algorithms well and to guide users in their use; so, even though it may only be a few lines of code, surrounding docstrings, gallery examples, etc. add value and educate. I don't know this area well, but I am wondering if it would make sense to include this function as an on-ramp to using SALib?I.e., the docstring would show how to use it, and direct users to SALib should the results indicate that further sensitivity analysis is necessary. The code is pure Python, so the maintenance burden is there but should not be high. Stéfan
participants (4)
-
Pamphile Roy
-
Ralf Gommers
-
Robert Kern
-
Stefan van der Walt