Hi everyone, In a recent PR (https://github.com/scipy/scipy/pull/14284 <https://github.com/scipy/scipy/pull/14284>), we had a discussion about static typing. My understanding from previous PRs, the inclusion of MyPy in the CI and discussions during our last community meetups, was that we wanted to gradually start adding type annotations in our code base. Thanks to the great help of Bas van Beek and others, we have a few common bricks we can use to ease the process. What is your opinion about adding annotations? I expressed my position in the issue. TL;DR I am in favour of adding type annotations. This greatly helps anyone using an IDE and downstream libraries like SciKit-Learn are basically waiting for us. Lastly, I am putting here again the link to the last Python Community survey https://www.jetbrains.com/lp/python-developers-survey-2020/ <https://www.jetbrains.com/lp/python-developers-survey-2020/>, static typing is the most wanted feature. Cheers, Pamphile
I'm probably -0.5; I've been merging PRs from Bas and others that add the typing on the assumption that folks want it, and because it has found a few bugs. But I am definitely not a fan of how it makes the code look when I'm working in i.e., vim, to say the least. And some of the Cython stub files are an absolute mess to read. A clean separation between dynamic Python glue code and statically typed C/C++ would be preferable in my opinion but then you can argue that the glue basically is the code these days. Another view I've seen expressed is that adding type hints merely provides an "illusion of productivity;" while that's a bit harsh, I do see where it is coming from compared to other priorities. There's a bit of an assumption that we'd be left behind if we don't add the type hints, and maybe someday auto-transpiling or whatever will start to happen more seriously and then the code is just slower. And/or we slowly get dropped/forked whatever by all the frameworks that need us to provide type hints to check for bugs in massive machine learning workflows and so on. So, I'd love to be grumpy and not put them in, but I'm afraid it may be a path to getting left behind. On Mon, 28 Jun 2021 at 09:13, Pamphile Roy <roy.pamphile@gmail.com> wrote:
Hi everyone,
In a recent PR (https://github.com/scipy/scipy/pull/14284), we had a discussion about static typing. My understanding from previous PRs, the inclusion of MyPy in the CI and discussions during our last community meetups, was that we wanted to gradually start adding type annotations in our code base. Thanks to the great help of Bas van Beek and others, we have a few common bricks we can use to ease the process.
What is your opinion about adding annotations?
I expressed my position in the issue. TL;DR I am in favour of adding type annotations. This greatly helps anyone using an IDE and downstream libraries like SciKit-Learn are basically waiting for us. Lastly, I am putting here again the link to the last Python Community survey https://www.jetbrains.com/lp/python-developers-survey-2020/, static typing is the most wanted feature.
Cheers, Pamphile _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
On Mon, Jun 28, 2021 at 11:49 PM Tyler Reddy <tyler.je.reddy@gmail.com> wrote:
I'm probably -0.5; I've been merging PRs from Bas and others that add the typing on the assumption that folks want it, and because it has found a few bugs.
But I am definitely not a fan of how it makes the code look when I'm working in i.e., vim, to say the least. And some of the Cython stub files are an absolute mess to read. A clean separation between dynamic Python glue code and statically typed C/C++ would be preferable in my opinion but then you can argue that the glue basically is the code these days. Another view I've seen expressed is that adding type hints merely provides an "illusion of productivity;" while that's a bit harsh, I do see where it is coming from compared to other priorities.
There's a bit of an assumption that we'd be left behind if we don't add the type hints, and maybe someday auto-transpiling or whatever will start to happen more seriously and then the code is just slower. And/or we slowly get dropped/forked whatever by all the frameworks that need us to provide type hints to check for bugs in massive machine learning workflows and so on. So, I'd love to be grumpy and not put them in, but I'm afraid it may be a path to getting left behind.
On Mon, 28 Jun 2021 at 09:13, Pamphile Roy <roy.pamphile@gmail.com> wrote:
Hi everyone,
In a recent PR (https://github.com/scipy/scipy/pull/14284), we had a discussion about static typing. My understanding from previous PRs, the inclusion of MyPy in the CI and discussions during our last community meetups, was that we wanted to gradually start adding type annotations in our code base. Thanks to the great help of Bas van Beek and others, we have a few common bricks we can use to ease the process.
What is your opinion about adding annotations?
I think the jury is still out. On the one hand there's a lot of users that are enthusiastic about having type annotations. And they do have benefits for IDE users, for people who want to use annotations in their own code which uses SciPy, etc. For SciPy development itself, my experience is that when one writes *new code*, type annotations can be nice - they kind of force you to write code that's sane from a typing perspective, which also means it avoids design mistakes like subclasses that break substitutability or special-casing scalar returns. On the other hand, for *existing code*, they don't help much and are a pain to add. And a lot of what we do is work with existing code. Another issue is that Mypy and other typing-related tools are still beta-quality, so they come with both bugs and with gaps in functionality. So I think adding annotations now is in the "let's see how well this works" phase. I think we should try it in particular with new code, where it has the most benefits, and accept PRs that add annotations to code we already have. We haven't done that enough to figure out the cost/benefit ratio for ourselves, and comparing that to the benefits for users is even more difficult. Cheers, Ralf
I expressed my position in the issue. TL;DR I am in favour of adding type annotations. This greatly helps anyone using an IDE and downstream libraries like SciKit-Learn are basically waiting for us. Lastly, I am putting here again the link to the last Python Community survey https://www.jetbrains.com/lp/python-developers-survey-2020/, static typing is the most wanted feature.
Cheers, Pamphile _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
On Wed, Jun 30, 2021 at 12:03 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Mon, Jun 28, 2021 at 11:49 PM Tyler Reddy <tyler.je.reddy@gmail.com> wrote:
I'm probably -0.5; I've been merging PRs from Bas and others that add the typing on the assumption that folks want it, and because it has found a few bugs.
But I am definitely not a fan of how it makes the code look when I'm working in i.e., vim, to say the least. And some of the Cython stub files are an absolute mess to read. A clean separation between dynamic Python glue code and statically typed C/C++ would be preferable in my opinion but then you can argue that the glue basically is the code these days. Another view I've seen expressed is that adding type hints merely provides an "illusion of productivity;" while that's a bit harsh, I do see where it is coming from compared to other priorities.
There's a bit of an assumption that we'd be left behind if we don't add the type hints, and maybe someday auto-transpiling or whatever will start to happen more seriously and then the code is just slower. And/or we slowly get dropped/forked whatever by all the frameworks that need us to provide type hints to check for bugs in massive machine learning workflows and so on. So, I'd love to be grumpy and not put them in, but I'm afraid it may be a path to getting left behind.
On Mon, 28 Jun 2021 at 09:13, Pamphile Roy <roy.pamphile@gmail.com> wrote:
Hi everyone,
In a recent PR (https://github.com/scipy/scipy/pull/14284), we had a discussion about static typing. My understanding from previous PRs, the inclusion of MyPy in the CI and discussions during our last community meetups, was that we wanted to gradually start adding type annotations in our code base. Thanks to the great help of Bas van Beek and others, we have a few common bricks we can use to ease the process.
What is your opinion about adding annotations?
I think the jury is still out. On the one hand there's a lot of users that are enthusiastic about having type annotations. And they do have benefits for IDE users, for people who want to use annotations in their own code which uses SciPy, etc.
For SciPy development itself, my experience is that when one writes *new code*, type annotations can be nice - they kind of force you to write code that's sane from a typing perspective, which also means it avoids design mistakes like subclasses that break substitutability or special-casing scalar returns. On the other hand, for *existing code*, they don't help much and are a pain to add. And a lot of what we do is work with existing code.
Another issue is that Mypy and other typing-related tools are still beta-quality, so they come with both bugs and with gaps in functionality.
So I think adding annotations now is in the "let's see how well this works" phase. I think we should try it in particular with new code, where it has the most benefits, and accept PRs that add annotations to code we already have. We haven't done that enough to figure out the cost/benefit ratio for ourselves, and comparing that to the benefits for users is even more difficult.
ISTM it's important that annotations are optional in the sense that we do not explicitly require that new code is typed. If someone is willing to add them, great (and if someone is willing to review a typing PR, even better :-)). But this should be possible to do in a follow-up PR, not as a requirement for an enhancement PR. If/when we really do get into the brave new fully typed world, we can reconsider :-). Cheers, Evgeni
On Wed, Jun 30, 2021, at 09:03, Evgeni Burovski wrote:
ISTM it's important that annotations are optional in the sense that we do not explicitly require that new code is typed. If someone is willing to add them, great (and if someone is willing to review a typing PR, even better :-)). But this should be possible to do in a follow-up PR, not as a requirement for an enhancement PR.
I agree, especially given that the typing notation is still changing. For example, they're currently working out a shorthand for typing function definitions (and I'm sure other simplifications are in the pipeline too). I'd still like to find proof that typing has significant impact. There have been some studies in JavaScript land that give rough metrics like "1/6 bugs could have been identified with typing" [0]. But then you see the Flask team annotating their entire project and finding almost none; probably because in Python we tend to test differently. We also tend to have more functional interfaces that return straightforward built-in objects. For now, I feel typing still mostly benefits IDE users. Perhaps in the future we'll see the accelerated frameworks Tyler referred to using it as well. Stéfan [0] https://earlbarr.com/publications/typestudy.pdf
On Wed, Jun 30, 2021 at 8:07 PM Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Wed, Jun 30, 2021, at 09:03, Evgeni Burovski wrote:
ISTM it's important that annotations are optional in the sense that we do not explicitly require that new code is typed. If someone is willing to add them, great (and if someone is willing to review a typing PR, even better :-)). But this should be possible to do in a follow-up PR, not as a requirement for an enhancement PR.
I agree, especially given that the typing notation is still changing. For example, they're currently working out a shorthand for typing function definitions (and I'm sure other simplifications are in the pipeline too).
That is the current state, and I agree it should stay like that.
I'd still like to find proof that typing has significant impact.
Consider this: having type annotations + mypy would have prevented writing np.matrix, and we would have sane ndarray subclassing. I think that's the main impact. Finding bugs it doesn't help much with (and the ones it does catch are usually easy ones). Cheers, Ralf
There have been some studies in JavaScript land that give rough metrics like "1/6 bugs could have been identified with typing" [0]. But then you see the Flask team annotating their entire project and finding almost none; probably because in Python we tend to test differently. We also tend to have more functional interfaces that return straightforward built-in objects.
For now, I feel typing still mostly benefits IDE users. Perhaps in the future we'll see the accelerated frameworks Tyler referred to using it as well.
Stéfan
[0] https://earlbarr.com/publications/typestudy.pdf _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
On Wed, Jun 30, 2021 at 2:15 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Wed, Jun 30, 2021 at 8:07 PM Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Wed, Jun 30, 2021, at 09:03, Evgeni Burovski wrote:
ISTM it's important that annotations are optional in the sense that we do not explicitly require that new code is typed. If someone is willing to add them, great (and if someone is willing to review a typing PR, even better :-)). But this should be possible to do in a follow-up PR, not as a requirement for an enhancement PR.
I agree, especially given that the typing notation is still changing. For example, they're currently working out a shorthand for typing function definitions (and I'm sure other simplifications are in the pipeline too).
That is the current state, and I agree it should stay like that.
I'd still like to find proof that typing has significant impact.
Consider this: having type annotations + mypy would have prevented writing np.matrix, and we would have sane ndarray subclassing. I think that's the main impact. Finding bugs it doesn't help much with (and the ones it does catch are usually easy ones).
I don't think it would have prevented np.matrix. The demand for that was too high. (For statsmodels) I worry more that it limits flexibility or gets too complicated. e.g. "instance with interface like scipy.distributions but only cdf and ppf need to be available" Josef
Cheers, Ralf
There have been some studies in JavaScript land that give rough metrics like "1/6 bugs could have been identified with typing" [0]. But then you see the Flask team annotating their entire project and finding almost none; probably because in Python we tend to test differently. We also tend to have more functional interfaces that return straightforward built-in objects.
For now, I feel typing still mostly benefits IDE users. Perhaps in the future we'll see the accelerated frameworks Tyler referred to using it as well.
Stéfan
[0] https://earlbarr.com/publications/typestudy.pdf _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
On Wed, Jun 30, 2021 at 08:14:55PM +0200, Ralf Gommers wrote:
On Wed, Jun 30, 2021 at 8:07 PM Stefan van der Walt <[1]stefanv@berkeley.edu> wrote:
On Wed, Jun 30, 2021, at 09:03, Evgeni Burovski wrote: > ISTM it's important that annotations are optional in the sense that we > do not explicitly require that new code is typed. If someone is > willing to add them, great (and if someone is willing to review a > typing PR, even better :-)). But this should be possible to do in a > follow-up PR, not as a requirement for an enhancement PR.
I agree, especially given that the typing notation is still changing. For example, they're currently working out a shorthand for typing function definitions (and I'm sure other simplifications are in the pipeline too).
I think it's worth noting that some numpy interface are inherently incompatible with fine-grain static typing. On simple example would be def foo(x : ndarray[int, :, :], strict: bool): return np.mean(x, keepdims=strict) What should be the return type of `foo`? We can't tell precisely, because it depends on the runtime value of strict. We're left with something alonside "this returns an array of the same dimension or a scalar of the same dtype" I don't know how much this dynamicity leaks to scipy interface, but it does look like a difficult problem to solve.
On Wed, Jun 30, 2021 at 10:37 PM Serge Guelton < serge.guelton@telecom-bretagne.eu> wrote:
On Wed, Jun 30, 2021 at 08:14:55PM +0200, Ralf Gommers wrote:
On Wed, Jun 30, 2021 at 8:07 PM Stefan van der Walt <[1]
wrote:
On Wed, Jun 30, 2021, at 09:03, Evgeni Burovski wrote: > ISTM it's important that annotations are optional in the sense
stefanv@berkeley.edu> that we
> do not explicitly require that new code is typed. If someone is > willing to add them, great (and if someone is willing to review a > typing PR, even better :-)). But this should be possible to do in a > follow-up PR, not as a requirement for an enhancement PR.
I agree, especially given that the typing notation is still
changing. For
example, they're currently working out a shorthand for typing
function
definitions (and I'm sure other simplifications are in the pipeline
too).
I think it's worth noting that some numpy interface are inherently incompatible with fine-grain static typing. On simple example would be
def foo(x : ndarray[int, :, :], strict: bool): return np.mean(x, keepdims=strict)
What should be the return type of `foo`? We can't tell precisely, because it depends on the runtime value of strict. We're left with something alonside "this returns an array of the same dimension or a scalar of the same dtype"
The "boolean keyword to control return type or shape" is (unfortunately) so common that there's a specific way to deal with this, using @overload: @overload def foo(x : ndarray[int, :, :], strict: Literal[True]) -> ndarray[int, :, :]: ... @overload def foo(x: ndarray[int,:,:], strict: Literal[False]) -> ndarray[int, :]: ... # The fallback, if a user passes `strict='this-is-true' then we have to guess (unless we raise an exception) def foo(x: ndarray[int,:,:], strict: bool) -> ndarray[int, :]: ... See https://mypy.readthedocs.io/en/stable/literal_types.html. So the idea is to treat `True` and `False` as distinct types. And if you build, e.g., a compiler for Python code then do the same. This is fairly painful and ugly, but doable. There is other behavior and functions in numpy code that's harder to deal with, things like value-based casting, output shapes that depend on (array) input data, and returning scalars instead of 0-D arrays. I totally agree that boolean keywords are best avoided, but at least there is a solution if they do happen. I don't know how much this dynamicity leaks to scipy interface, but it does
look like a difficult problem to solve.
SciPy is just as bad as NumPy in this respect. For example, scipy.stats does this a lot: if y.ndim == 0: y = y[()] # return a float rather than an array here return y Type checkers will complain loudly about this kind of thing, so having a type checker in CI warns you about this being a bad pattern. On the other hand, to add correct type annotations to old code that's already like that, you have to jump through a lot of hoops. Cheers, Ralf
How would you annotate scipy distributions inputs? args, kwargs, flexible number of parameters, parameters that can be kwargs or args. And how would this affect subclasses? The distribution classes have a lot of input validation, and it took me some time recently to get a subclass to fit their design. Josef
On Fri, Jul 2, 2021 at 6:01 PM <josef.pktd@gmail.com> wrote:
How would you annotate scipy distributions inputs?
args, kwargs, flexible number of parameters, parameters that can be kwargs or args.
And how would this affect subclasses?
The distribution classes have a lot of input validation, and it took me some time recently to get a subclass to fit their design.
In their current form, they're basically impossible to type (and we shouldn't try). If we'd rewrite the framework, it would not look anything like the current design though - everything `def some_method(x, *args, **kwargs)` is madness, and all shape parameters broadcasting is madness^2 - we're still finding new bugs there after 10+ years. It's a complex topic so a new design would require a lot of thought, but I think the public methods would look something like: def pdf(x: ndarray, a: float, loc: float | None, scale: float | None) -> ndarray: ... def rvs(a: float, size: int, loc: float | None, scale: float | None, random_state: np.random.Generator | int | None) -> ndarray: ... It would not have `rv_xxx` base classes, but factory functions to generate classes. No third-party subclassing, just provide the tools to define your own classes with the factory functions. Cheers, Ralf
On 30/06/2021 20:06, Stefan van der Walt wrote:
There have been some studies in JavaScript land that give rough metrics like "1/6 bugs could have been identified with typing" [0]. But then you see the Flask team annotating their entire project and finding almost none; probably because in Python we tend to test differently.
It likely also depends on the number and type of bugs that are there in the first place. If one takes an arbitrary notebook or weakly tested code it would probably find more. Though indeed it would be good to have some quantifiable measurement of how useful typing is to downstream users. Roman
participants (8)
-
Evgeni Burovski -
josef.pktd@gmail.com -
Pamphile Roy -
Ralf Gommers -
Roman Yurchak -
Serge Guelton -
Stefan van der Walt -
Tyler Reddy