Is there any official position on PEP484/mypy?
Hi, [If you're also on the numpy mailing list and get a similar version of the message, I apologise for that] I work at Machinalis were we use a lot of scikit-learn (and the pydata stack in general). Recently we've also been getting involved with mypy, which is a tool to type check (not on runtime, think of it as a linter) annotated python code (the way of annotating python types has been recently standarized in PEP 484). As part of that involvement we've started creating type annotations for the Python libraries we use most, which include both numpy and scikit-learn. Mypy provides a way to specify types with annotations in separate files in case you don't have control over a library, so we have created an initial proof of concept for numpy at [1], and we are actively improving it. You can find some additional information about it and some problems we've found on the way at this blogpost [2]. We were planning to also start some work on scikit-learn (which has a much larger surface area than numpy, so probably focusing on small parts for now); we had to start with numpy anyway given that SKL depends on it. What I wanted to ask is if the people involved on the SKL project are aware of PEP484 annotations and if you have some interest in starting using them. The main benefit is that annotations serve as clear (and automatically testable) documentation for users, and secondary benefits is that users discovers bugs more quickly and that some IDEs (like pycharm) are starting to use this information for smart editor features (autocompletion, online checking, refactoring tools); eventually tools like jupyter could take advantage of these annotations in the future. And the cost of writing and including these are relatively low. We're doing the work anyway, but contributing our typespecs back could make it easier for users to benefit from this, and for us to maintain it and keep it in sync with future releases. If you've never heard about PEP484 or mypy (it happens a lot) I'll be happy to clarify anything about it that might helpunderstand this situation Thanks! D. [1] https://github.com/machinalis/mypy-data [2] http://www.machinalis.com/blog/writing-type-stubs-for-numpy/ -- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset
Hi Daniel. This hasn't been brought up before so there is no "official position". I am generally in favor, though I'm not sure how doable it is. We are generally pretty generous in accepting all kinds of inputs, and many of our options can have different types: (None, int, float, string, nd-array) is relatively common as a type for an option. As we still support 2.6, we would need to do comments or external files. As a user, you are probably most interested in the outputs, right? The types returned by scikit-learn could probably be auto-generated. I'm curious to see what others think. I'd be surprised if anyone is willing to invest a large amount of time on this, though if you guys want to contribute, we might be able to work something out. Andy On 07/27/2016 03:17 PM, Daniel Moisset wrote:
Hi,
[If you're also on the numpy mailing list and get a similar version of the message, I apologise for that]
I work at Machinalis were we use a lot of scikit-learn (and the pydata stack in general). Recently we've also been getting involved with mypy, which is a tool to type check (not on runtime, think of it as a linter) annotated python code (the way of annotating python types has been recently standarized in PEP 484).
As part of that involvement we've started creating type annotations for the Python libraries we use most, which include both numpy and scikit-learn. Mypy provides a way to specify types with annotations in separate files in case you don't have control over a library, so we have created an initial proof of concept for numpy at [1], and we are actively improving it. You can find some additional information about it and some problems we've found on the way at this blogpost [2]. We were planning to also start some work on scikit-learn (which has a much larger surface area than numpy, so probably focusing on small parts for now); we had to start with numpy anyway given that SKL depends on it.
What I wanted to ask is if the people involved on the SKL project are aware of PEP484 annotations and if you have some interest in starting using them. The main benefit is that annotations serve as clear (and automatically testable) documentation for users, and secondary benefits is that users discovers bugs more quickly and that some IDEs (like pycharm) are starting to use this information for smart editor features (autocompletion, online checking, refactoring tools); eventually tools like jupyter could take advantage of these annotations in the future. And the cost of writing and including these are relatively low.
We're doing the work anyway, but contributing our typespecs back could make it easier for users to benefit from this, and for us to maintain it and keep it in sync with future releases.
If you've never heard about PEP484 or mypy (it happens a lot) I'll be happy to clarify anything about it that might helpunderstand this situation
Thanks!
D.
[1] https://github.com/machinalis/mypy-data [2] http://www.machinalis.com/blog/writing-type-stubs-for-numpy/
-- Daniel F. Moisset - UK Country Manager www.machinalis.com <http://www.machinalis.com> Skype: @dmoisset
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hi Andreas, thanks for the reply, I think many arguments may end up with open specs, but even specifying "Union[None, int, float, string, np.ndarray]" might be useful. I actually expect to be able to provide stricter types using generics (you can say things like "this a list of floats" or "this is a classifier on float features and str labels"). Not only the outputs are relevant, this is useful to detect some silly mistakes like wrong numbers of arguments or misspells in method types. but yes, result types can also detect errors like "some_classifier.fit_transform(X).predict(X)" (which I've seen on carelessly refactored code :) ). The good think of scikit-learn is that most methods already have information about types in docstrings so it should be easy for us to move forward and validate what we're doing (and the docstrings ;-) ) Given that there's interest (or at least no opposition) on this I feel inclined to create a fork and start adding annotations (the comment-based ones) into the code (which is better suited for this scenario than creating external type stubs). We're already putting work into it and would be more than happy to turn it into a contribution if it works well (which is a "real" if given that this is still an experimental terrain) Best, D. On Wed, Jul 27, 2016 at 10:08 PM, Andreas Mueller <t3kcit@gmail.com> wrote:
Hi Daniel. This hasn't been brought up before so there is no "official position". I am generally in favor, though I'm not sure how doable it is. We are generally pretty generous in accepting all kinds of inputs, and many of our options can have different types: (None, int, float, string, nd-array) is relatively common as a type for an option. As we still support 2.6, we would need to do comments or external files.
As a user, you are probably most interested in the outputs, right? The types returned by scikit-learn could probably be auto-generated.
I'm curious to see what others think. I'd be surprised if anyone is willing to invest a large amount of time on this, though if you guys want to contribute, we might be able to work something out.
Andy
On 07/27/2016 03:17 PM, Daniel Moisset wrote:
Hi,
[If you're also on the numpy mailing list and get a similar version of the message, I apologise for that]
I work at Machinalis were we use a lot of scikit-learn (and the pydata stack in general). Recently we've also been getting involved with mypy, which is a tool to type check (not on runtime, think of it as a linter) annotated python code (the way of annotating python types has been recently standarized in PEP 484).
As part of that involvement we've started creating type annotations for the Python libraries we use most, which include both numpy and scikit-learn. Mypy provides a way to specify types with annotations in separate files in case you don't have control over a library, so we have created an initial proof of concept for numpy at [1], and we are actively improving it. You can find some additional information about it and some problems we've found on the way at this blogpost [2]. We were planning to also start some work on scikit-learn (which has a much larger surface area than numpy, so probably focusing on small parts for now); we had to start with numpy anyway given that SKL depends on it.
What I wanted to ask is if the people involved on the SKL project are aware of PEP484 annotations and if you have some interest in starting using them. The main benefit is that annotations serve as clear (and automatically testable) documentation for users, and secondary benefits is that users discovers bugs more quickly and that some IDEs (like pycharm) are starting to use this information for smart editor features (autocompletion, online checking, refactoring tools); eventually tools like jupyter could take advantage of these annotations in the future. And the cost of writing and including these are relatively low.
We're doing the work anyway, but contributing our typespecs back could make it easier for users to benefit from this, and for us to maintain it and keep it in sync with future releases.
If you've never heard about PEP484 or mypy (it happens a lot) I'll be happy to clarify anything about it that might helpunderstand this situation
Thanks!
D.
[1] https://github.com/machinalis/mypy-data [2] http://www.machinalis.com/blog/writing-type-stubs-for-numpy/
-- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset
_______________________________________________ scikit-learn mailing listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset
On 07/28/2016 11:55 AM, Daniel Moisset wrote:
Given that there's interest (or at least no opposition) on this I feel inclined to create a fork and start adding annotations (the comment-based ones) into the code (which is better suited for this scenario than creating external type stubs). We're already putting work into it and would be more than happy to turn it into a contribution if it works well (which is a "real" if given that this is still an experimental terrain)
That sounds good to me. But please keep in mind that I don't speak for the project, so if it works well, we would still need to achieve consensus within the project that this is something useful. If you find some bugs with the annotations and mypy, that would probably prove its value to some degree [and if you don't, I might be inclined to argue it's not working well ;] Joel, Olivier, Gael, anyone else?: opinions?
On Thu, Jul 28, 2016 at 5:04 PM, Andreas Mueller <t3kcit@gmail.com> wrote:
On 07/28/2016 11:55 AM, Daniel Moisset wrote:
Given that there's interest (or at least no opposition) on this I feel inclined to create a fork and start adding annotations (the comment-based ones) into the code (which is better suited for this scenario than creating external type stubs). We're already putting work into it and would be more than happy to turn it into a contribution if it works well (which is a "real" if given that this is still an experimental terrain)
That sounds good to me. But please keep in mind that I don't speak for
the project, so if it works well, we would still need to achieve consensus within the project that this is something useful.
Of course; again I'm doing it anyway because I need/want it. So I'm fine if you end up not using it; but I'd be happier if it's useful for people outside our team. If you find some bugs with the annotations and mypy, that would probably
prove its value to some degree [and if you don't, I might be inclined to argue it's not working well ;]
heh, most mature software in python doesn't have many *type* bugs (which tend to be more superficial), so I don't think I'll change much. I have already played a bit with annotating some 3rd party code[1] and I did not found bugs, but I found some opportunities to make code more readable and/or simple. @Matthew: Regarding 2.6 vs 2.7, I don't believe it changes anything with respect to the effort needed here (2.7 vs 3.x would make a bigger difference, but I know that will take some time). Best, D. -- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset
On Thu, Jul 28, 2016 at 12:04:48PM -0400, Andreas Mueller wrote:
If you find some bugs with the annotations and mypy, that would probably prove its value to some degree [and if you don't, I might be inclined to argue it's not working well ;]
Joel, Olivier, Gael, anyone else?: opinions?
The only reserve that I might have is with regards to the maintainability of these annotation. I am afraid that they coderot. Daniel, any comments on that concern? Cheers, Gaël
On 07/28/2016 12:43 PM, Gael Varoquaux wrote:
On Thu, Jul 28, 2016 at 12:04:48PM -0400, Andreas Mueller wrote:
If you find some bugs with the annotations and mypy, that would probably prove its value to some degree [and if you don't, I might be inclined to argue it's not working well ;] Joel, Olivier, Gael, anyone else?: opinions? The only reserve that I might have is with regards to the maintainability of these annotation. I am afraid that they coderot.
Daniel, any comments on that concern? We can put mypy in the CI, right? Shouldn't that prevent it from rotting? [I don't actually know. Daniel?]
I am not a core dev but just wanted to say that I like the idea of adding static type checking a lot, Daniel. Coincidentally, I just listened to the Podcast.__init__ episode on mypy a few weeks ago and was planning to use it in my personal + research projects as well. I think the “normal” scikit-learn user would not really benefit from it (since the docstrings are already pretty good and thorough), but I think that it can be immensly useful for devs and contributors (and augmenting the unittest)
We can put mypy in the CI, right? Shouldn't that prevent it from rotting?
Yeah, it can be added to Travis CI checks, for example. One question though, are you planning to apply the “whole" type checking syntax? E.g., def hello(r: int, c=5) -> str: s = 'hello' # type: str return '(%d + %d) times %s' % (r, c, s) Does this work with Python 2.7, 3.4 etc? Or are you only thinking about the “comment” syntax? E.g., def hello(r, c=5): s = 'hello' # type: str return '(%d + %d) times %s' % (r, c, s) Which should work on all Py versions. Best, Sebastian
On Jul 28, 2016, at 12:49 PM, Andreas Mueller <t3kcit@gmail.com> wrote:
On 07/28/2016 12:43 PM, Gael Varoquaux wrote:
On Thu, Jul 28, 2016 at 12:04:48PM -0400, Andreas Mueller wrote:
If you find some bugs with the annotations and mypy, that would probably prove its value to some degree [and if you don't, I might be inclined to argue it's not working well ;] Joel, Olivier, Gael, anyone else?: opinions? The only reserve that I might have is with regards to the maintainability of these annotation. I am afraid that they coderot.
Daniel, any comments on that concern? We can put mypy in the CI, right? Shouldn't that prevent it from rotting? [I don't actually know. Daniel?]
scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Agreed that it sounded great on podcast.__init__ and I specifically thought it would be helpful for sklearn as someone who is not a developer but has been digging into the codebase. If anyone on the list wants an overview of MyPy I highly recommend listening to that episode: http://podcastinit.podbean.com/e/episode-65-mypy-with-david-fisher-and-greg-... On Thu, Jul 28, 2016 at 2:49 PM, Sebastian Raschka < mail@sebastianraschka.com> wrote:
I am not a core dev but just wanted to say that I like the idea of adding static type checking a lot, Daniel. Coincidentally, I just listened to the Podcast.__init__ episode on mypy a few weeks ago and was planning to use it in my personal + research projects as well.
I think the “normal” scikit-learn user would not really benefit from it (since the docstrings are already pretty good and thorough), but I think that it can be immensly useful for devs and contributors (and augmenting the unittest)
We can put mypy in the CI, right? Shouldn't that prevent it from rotting?
Yeah, it can be added to Travis CI checks, for example.
One question though, are you planning to apply the “whole" type checking syntax? E.g.,
def hello(r: int, c=5) -> str: s = 'hello' # type: str return '(%d + %d) times %s' % (r, c, s)
Does this work with Python 2.7, 3.4 etc?
Or are you only thinking about the “comment” syntax? E.g.,
def hello(r, c=5): s = 'hello' # type: str return '(%d + %d) times %s' % (r, c, s)
Which should work on all Py versions.
Best, Sebastian
On Jul 28, 2016, at 12:49 PM, Andreas Mueller <t3kcit@gmail.com> wrote:
On Thu, Jul 28, 2016 at 12:04:48PM -0400, Andreas Mueller wrote:
If you find some bugs with the annotations and mypy, that would
prove its value to some degree [and if you don't, I might be inclined to argue it's not working well ;] Joel, Olivier, Gael, anyone else?: opinions? The only reserve that I might have is with regards to the
On 07/28/2016 12:43 PM, Gael Varoquaux wrote: probably maintainability
of these annotation. I am afraid that they coderot.
Daniel, any comments on that concern? We can put mypy in the CI, right? Shouldn't that prevent it from rotting? [I don't actually know. Daniel?]
scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- David Nicholson, Ph.D. Candidate Sober Lab <http://www.biology.emory.edu/research/Sober/Home.html>, Emory Neuroscience Program. <http://www.emory.edu/NEUROSCIENCE/> www.nicholdav.info; <http://www.nicholdav.info>https://github.com/NickleDave
@Andreas, @Gael: This indeed is something that could be included in the CI, and you could ensure that the annotations have both internal consistency (i.e., what they say matches what the implementation is doing) and external consistency (the way callers are using it matches the way they call it). To clarify a bit, there are 2 things involved here: * PEP-484 provides a standard way to add anotations. Those have some syntax but it's just metadata that gets stored but have no effect whatsoever on runtime, kind of like a structured docstring. These have no protection by themselves against bitrot (in the same way that docstrings may rot) * mypy is a tool that can be run on a linter, it parses the code and the annotations, and verify that there's consistency. It's something that you can use on CI or whle developer (comparable to a linter like flake8, but doing a deeper analysis). The annotations described by PEP484 is "gradual" (you don't have to cover the whole code, only the parts where static typing makes sense, and "unchecked" code is not modified). mypy respects that and also provides a way to silence the checker for situations where the type system is oversensitive but you know you're right (similar to flake8's "# noqa"). @Sebastian I had heard the podcast and it makes a very strong argument argument for using it, thanks for recommending it (people in dropbox are using this on their production codebase). I do believe that end users will start getting benefits from this that are stronger than docstrings, specially when this tooling starts to get integrated in code editors (pycharm is already doing this) so they can get inline checking and detection of errors when they call the scikit-learn API, and better context-aware completion. That's not counting those users that want to use mypy in their own codebases and would get a better advantage if SKL supported it (that's the situation I am in, together with some colleagues). Regarding syntax, if we add inline annotations (which IMO is the best path forward if they have a chance of getting integrated), the only option is using the 2.x compatible annotations (which are comments). That one is different to your 2 examples, that would be: def hello(r, c=5): # type: (int, int) -> str s = 'hello' return '(%d + %d) times %s' % (r, c, s) (note that your " # type: str" is valid but not required, given that s can be obviously inferred to be a string) Another possible syntax (both are valid, this one makes sense for longer signatures) is: def hello(r, # type: int c=5): # type: (...) -> str s = 'hello' return '(%d + %d) times %s' % (r, c, s) (in this case there's again no need to specify a type for c given that it can be inferred as an int) These 2 variants work well in 2.x and 3.x Best, D. P.S.: In my last email I forgot to put this link describing some of the things that I've found on real code http://www.machinalis.com/blog/a-day-with-mypy-part-1/ On Thu, Jul 28, 2016 at 5:49 PM, Andreas Mueller <t3kcit@gmail.com> wrote:
On 07/28/2016 12:43 PM, Gael Varoquaux wrote:
On Thu, Jul 28, 2016 at 12:04:48PM -0400, Andreas Mueller wrote:
If you find some bugs with the annotations and mypy, that would probably prove its value to some degree [and if you don't, I might be inclined to argue it's not working well ;] Joel, Olivier, Gael, anyone else?: opinions?
The only reserve that I might have is with regards to the maintainability of these annotation. I am afraid that they coderot.
Daniel, any comments on that concern?
We can put mypy in the CI, right? Shouldn't that prevent it from rotting? [I don't actually know. Daniel?]
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset
Thanks for the update, Daniel. The Py 2.x compatible alternatives,
def hello(r, c=5): # type: (int, int) -> str …
are neat, and I didn’t know about these. Although, I must say that
def hello(r, c=5): # type: (int, int) -> str …
is a tad more useful, for example, in Jupyter Notebooks/IPython regarding the shift-tab function help. However, I’d say that your suggestion is the best bet for now to maintain Py 2.x compatibility (until 2020 maybe :P). Cheers, Sebastian
On Jul 29, 2016, at 12:55 PM, Daniel Moisset <dmoisset@machinalis.com> wrote:
@Andreas, @Gael:
This indeed is something that could be included in the CI, and you could ensure that the annotations have both internal consistency (i.e., what they say matches what the implementation is doing) and external consistency (the way callers are using it matches the way they call it).
To clarify a bit, there are 2 things involved here:
* PEP-484 provides a standard way to add anotations. Those have some syntax but it's just metadata that gets stored but have no effect whatsoever on runtime, kind of like a structured docstring. These have no protection by themselves against bitrot (in the same way that docstrings may rot) * mypy is a tool that can be run on a linter, it parses the code and the annotations, and verify that there's consistency. It's something that you can use on CI or whle developer (comparable to a linter like flake8, but doing a deeper analysis).
The annotations described by PEP484 is "gradual" (you don't have to cover the whole code, only the parts where static typing makes sense, and "unchecked" code is not modified). mypy respects that and also provides a way to silence the checker for situations where the type system is oversensitive but you know you're right (similar to flake8's "# noqa").
@Sebastian
I had heard the podcast and it makes a very strong argument argument for using it, thanks for recommending it (people in dropbox are using this on their production codebase).
I do believe that end users will start getting benefits from this that are stronger than docstrings, specially when this tooling starts to get integrated in code editors (pycharm is already doing this) so they can get inline checking and detection of errors when they call the scikit-learn API, and better context-aware completion. That's not counting those users that want to use mypy in their own codebases and would get a better advantage if SKL supported it (that's the situation I am in, together with some colleagues).
Regarding syntax, if we add inline annotations (which IMO is the best path forward if they have a chance of getting integrated), the only option is using the 2.x compatible annotations (which are comments). That one is different to your 2 examples, that would be:
def hello(r, c=5): # type: (int, int) -> str s = 'hello' return '(%d + %d) times %s' % (r, c, s)
(note that your " # type: str" is valid but not required, given that s can be obviously inferred to be a string)
Another possible syntax (both are valid, this one makes sense for longer signatures) is:
def hello(r, # type: int c=5): # type: (...) -> str s = 'hello' return '(%d + %d) times %s' % (r, c, s)
(in this case there's again no need to specify a type for c given that it can be inferred as an int)
These 2 variants work well in 2.x and 3.x
Best, D.
P.S.: In my last email I forgot to put this link describing some of the things that I've found on real code http://www.machinalis.com/blog/a-day-with-mypy-part-1/
On Thu, Jul 28, 2016 at 5:49 PM, Andreas Mueller <t3kcit@gmail.com> wrote:
On 07/28/2016 12:43 PM, Gael Varoquaux wrote: On Thu, Jul 28, 2016 at 12:04:48PM -0400, Andreas Mueller wrote: If you find some bugs with the annotations and mypy, that would probably prove its value to some degree [and if you don't, I might be inclined to argue it's not working well ;] Joel, Olivier, Gael, anyone else?: opinions? The only reserve that I might have is with regards to the maintainability of these annotation. I am afraid that they coderot.
Daniel, any comments on that concern? We can put mypy in the CI, right? Shouldn't that prevent it from rotting? [I don't actually know. Daniel?]
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hi Daniel. Thanks for your clarification, that's exactly how I understood it to work from what I saw so far. I don't like either annotation type in terms of syntax that much. I don't understand why they didn't go with something closer to the Python 3 syntax. But I guess we have to live with it or write a new pep ;) I think the one-line version should be preferred as it is shorter and less intrusive. Best, Andy
I am still worried that this is going to add even more complexity to contributing: people will contribute without knowing type hint, CI will break, they won't understand why it breaks, won't be able to reproduce it, and it will stall PRs. Can you summarize once again in very simple terms what would be the big benefits?
I've been using mypy on a much smaller codebase I've been developing. The main benefits are: 1- Much nicer IDE experience when using something like pycharm. I expect more text editors to start supporting this in the future. 2- An additional way to catch some compile time errors early on. For a codebase as mature as scikit-learn, that's probably not a huge deal. 3- Makes it nicer for other codebases using mypy to use scikit-learn. Of those, the main benefit is by far 1. I also think that the opportunity cost is very low: annotations are easy to keep up to date, and the annotation syntax is really very simple. On Fri, 29 Jul 2016 at 21:57 Gael Varoquaux <gael.varoquaux@normalesup.org> wrote:
I am still worried that this is going to add even more complexity to contributing: people will contribute without knowing type hint, CI will break, they won't understand why it breaks, won't be able to reproduce it, and it will stall PRs.
Can you summarize once again in very simple terms what would be the big benefits? _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
On 07/30/2016 03:20 AM, Alexandre Gramfort wrote:
I am still worried that this is going to add even more complexity to contributing: people will contribute without knowing type hint, CI will break, they won't understand why it breaks, won't be able to reproduce it, and it will stall PRs. +1
same feeling here.
That is "only" a concern if they change the type of something that is annotated, right? Is that something that happens often? I guess it would also break if someone adds an argument to an annotated function, that might happen more often.
On Fri, Jul 29, 2016 at 8:57 PM, Gael Varoquaux < gael.varoquaux@normalesup.org> wrote:
Can you summarize once again in very simple terms what would be the big benefits?
Benefits for regular scikit-learn users 1. Reliable information on method signatures in a standarized way ("reliable" in the sense of "automatically verified") 2. Better integration with tools supporting PEP-484 (editors, documentation tools). This is a small set now, but I expect it to grow (and it's also an egg and chicken problem, support has to start somewhere) Benefits for scikit-learn users also using mypy and/or PEP-484 (probably not a large set, but I know a few people :) ) 0. Same as the rest of the users 1. Early detection of errors in own code while writing code based on SKL 2. Making own code more readable/explicit by annotating functions that receive/return SKL types (and verifying that annotations) Benefits for scikit-learn developers 1. Some extra checks that changes keep internal consistency 2. (Future) possible simplification of typing information in docstrings, which would make themselves redundant (this would require updating doc generators) Regarding the cost for contributing, an scenario where you get a CI error due to mypy would be because: * the change in the code somewhat changed the existing accepted/returned types, which is a change in the API and should actually be verified * the change in the code extended the signature of an existing function (what Andreas mentioned); in this situation it's similar to a PR that adds an argument and doesn't update the docstring (only that this is automatically caught). WRT to the second issue, the error here might be confusing when using the "one line" syntax because arguments may "misalign" with their signatures. The multiline version (or the python3-only form) is safer in that sense (in fact, adding an argument there will not produce a CI problem because its unannotated and assumed to be "any type"). Adding new modules/methods without no annotations wouldn't produce an error, just an incompleteness in the annotations A possible source of problems like the one you mention is that the implementation of the annotated methods will be checked, and sometimes you'll get a warning about a local variable if mypy can't infer its type (it happens sometimes when assigning an empty list to a local, where mypy knows that it's a list but doesn't know the element type). But in that case I think the message you get is very obvious. -- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset
A couple of things I forgot to mention: * One relevant consequence is that, to add annotations on the code, scikit-learn should depend on the "typing"[1] module which contains some of the basic names imported and used in annotations. It's a stdlib module in python 3.5, but the PyPI package backports it to python 2.7 and newer (I'm not sure how it works with Python 2.6, which might be an issue) * As an example of the kind of bugs that mypy can find, someone here already found a documentation bug in the sklearn.svm.SVC() initializer; the "kernel" parameter is described as "string"[2], when it's actually a "string or callable" (which can be read in the "small print" description of the argument). That kind of slips would be automatically prevented if declared as an annotation with mypy on the CI. Also it would be more clear what is the signature of the callable directly instead of looking up additional documentation on kernel functions or digging into the source [1] https://pypi.python.org/pypi/typing [2] http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklear... On Mon, Aug 1, 2016 at 5:15 PM, Daniel Moisset <dmoisset@machinalis.com> wrote:
On Fri, Jul 29, 2016 at 8:57 PM, Gael Varoquaux < gael.varoquaux@normalesup.org> wrote:
Can you summarize once again in very simple terms what would be the big benefits?
Benefits for regular scikit-learn users
1. Reliable information on method signatures in a standarized way ("reliable" in the sense of "automatically verified") 2. Better integration with tools supporting PEP-484 (editors, documentation tools). This is a small set now, but I expect it to grow (and it's also an egg and chicken problem, support has to start somewhere)
Benefits for scikit-learn users also using mypy and/or PEP-484 (probably not a large set, but I know a few people :) )
0. Same as the rest of the users 1. Early detection of errors in own code while writing code based on SKL 2. Making own code more readable/explicit by annotating functions that receive/return SKL types (and verifying that annotations)
Benefits for scikit-learn developers
1. Some extra checks that changes keep internal consistency 2. (Future) possible simplification of typing information in docstrings, which would make themselves redundant (this would require updating doc generators)
Regarding the cost for contributing, an scenario where you get a CI error due to mypy would be because:
* the change in the code somewhat changed the existing accepted/returned types, which is a change in the API and should actually be verified * the change in the code extended the signature of an existing function (what Andreas mentioned); in this situation it's similar to a PR that adds an argument and doesn't update the docstring (only that this is automatically caught).
WRT to the second issue, the error here might be confusing when using the "one line" syntax because arguments may "misalign" with their signatures. The multiline version (or the python3-only form) is safer in that sense (in fact, adding an argument there will not produce a CI problem because its unannotated and assumed to be "any type").
Adding new modules/methods without no annotations wouldn't produce an error, just an incompleteness in the annotations
A possible source of problems like the one you mention is that the implementation of the annotated methods will be checked, and sometimes you'll get a warning about a local variable if mypy can't infer its type (it happens sometimes when assigning an empty list to a local, where mypy knows that it's a list but doesn't know the element type). But in that case I think the message you get is very obvious.
-- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset
-- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset
I certainly see the benefit, and think we would benefit also from finding test coverage holes wrt input type. But I think without ndarray/sparse matrix type support, we're not going to be able to annotate most of our code in sufficient detail. On 2 August 2016 at 23:34, Daniel Moisset <dmoisset@machinalis.com> wrote:
A couple of things I forgot to mention:
* One relevant consequence is that, to add annotations on the code, scikit-learn should depend on the "typing"[1] module which contains some of the basic names imported and used in annotations. It's a stdlib module in python 3.5, but the PyPI package backports it to python 2.7 and newer (I'm not sure how it works with Python 2.6, which might be an issue) * As an example of the kind of bugs that mypy can find, someone here already found a documentation bug in the sklearn.svm.SVC() initializer; the "kernel" parameter is described as "string"[2], when it's actually a "string or callable" (which can be read in the "small print" description of the argument). That kind of slips would be automatically prevented if declared as an annotation with mypy on the CI. Also it would be more clear what is the signature of the callable directly instead of looking up additional documentation on kernel functions or digging into the source
[1] https://pypi.python.org/pypi/typing [2] http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklear...
On Mon, Aug 1, 2016 at 5:15 PM, Daniel Moisset <dmoisset@machinalis.com> wrote:
On Fri, Jul 29, 2016 at 8:57 PM, Gael Varoquaux < gael.varoquaux@normalesup.org> wrote:
Can you summarize once again in very simple terms what would be the big benefits?
Benefits for regular scikit-learn users
1. Reliable information on method signatures in a standarized way ("reliable" in the sense of "automatically verified") 2. Better integration with tools supporting PEP-484 (editors, documentation tools). This is a small set now, but I expect it to grow (and it's also an egg and chicken problem, support has to start somewhere)
Benefits for scikit-learn users also using mypy and/or PEP-484 (probably not a large set, but I know a few people :) )
0. Same as the rest of the users 1. Early detection of errors in own code while writing code based on SKL 2. Making own code more readable/explicit by annotating functions that receive/return SKL types (and verifying that annotations)
Benefits for scikit-learn developers
1. Some extra checks that changes keep internal consistency 2. (Future) possible simplification of typing information in docstrings, which would make themselves redundant (this would require updating doc generators)
Regarding the cost for contributing, an scenario where you get a CI error due to mypy would be because:
* the change in the code somewhat changed the existing accepted/returned types, which is a change in the API and should actually be verified * the change in the code extended the signature of an existing function (what Andreas mentioned); in this situation it's similar to a PR that adds an argument and doesn't update the docstring (only that this is automatically caught).
WRT to the second issue, the error here might be confusing when using the "one line" syntax because arguments may "misalign" with their signatures. The multiline version (or the python3-only form) is safer in that sense (in fact, adding an argument there will not produce a CI problem because its unannotated and assumed to be "any type").
Adding new modules/methods without no annotations wouldn't produce an error, just an incompleteness in the annotations
A possible source of problems like the one you mention is that the implementation of the annotated methods will be checked, and sometimes you'll get a warning about a local variable if mypy can't infer its type (it happens sometimes when assigning an empty list to a local, where mypy knows that it's a list but doesn't know the element type). But in that case I think the message you get is very obvious.
-- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset
-- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
* One relevant consequence is that, to add annotations on the code, scikit-learn should depend on the "typing"[1] module which contains some of the basic names imported and used in annotations. It's a stdlib module in python 3.5, but the PyPI package backports it to python 2.7 and newer (I'm not sure how it works with Python 2.6, which might be an issue)
I am afraid that this is going to be a problem: we have a no dependency policy (beyond numpy and scipy).
On 08/02/2016 01:48 PM, Gael Varoquaux wrote:
* One relevant consequence is that, to add annotations on the code, scikit-learn should depend on the "typing"[1] module which contains some of the basic names imported and used in annotations. It's a stdlib module in python 3.5, but the PyPI package backports it to python 2.7 and newer (I'm not sure how it works with Python 2.6, which might be an issue) I am afraid that this is going to be a problem: we have a no dependency policy (beyond numpy and scipy). I still think this is a point we should discuss further ;)
If the dependency is really a showstopper, bundling could be an option. The module is a single, pure python file so that shouldn't complicate things much. @Joel, regarding «without ndarray/sparse matrix type support, we're not going to be able to annotate most of our code in sufficient detail» That shouldn't be a problem, we have already written some working support for numpy at https://github.com/machinalis/mypy-data, so it's possible no annotate ndarrays and matrix types (scipy.sparse is not covered yet, I could take a look into that). Best, D. On Tue, Aug 2, 2016 at 7:12 PM, Andreas Mueller <t3kcit@gmail.com> wrote:
On 08/02/2016 01:48 PM, Gael Varoquaux wrote:
* One relevant consequence is that, to add annotations on the code,
scikit-learn should depend on the "typing"[1] module which contains some of the basic names imported and used in annotations. It's a stdlib module in python 3.5, but the PyPI package backports it to python 2.7 and newer (I'm not sure how it works with Python 2.6, which might be an issue)
I am afraid that this is going to be a problem: we have a no dependency policy (beyond numpy and scipy).
I still think this is a point we should discuss further ;)
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset
Another point about the dependency: the dependency is not required for run time - it is only required to run the type checker. You could easily put it in a try/catch block and people running scikit-learn wouldn't need it. On Thu, 4 Aug 2016 at 13:41 Daniel Moisset <dmoisset@machinalis.com> wrote:
If the dependency is really a showstopper, bundling could be an option. The module is a single, pure python file so that shouldn't complicate things much.
@Joel, regarding «without ndarray/sparse matrix type support, we're not going to be able to annotate most of our code in sufficient detail»
That shouldn't be a problem, we have already written some working support for numpy at https://github.com/machinalis/mypy-data, so it's possible no annotate ndarrays and matrix types (scipy.sparse is not covered yet, I could take a look into that).
Best, D.
On Tue, Aug 2, 2016 at 7:12 PM, Andreas Mueller <t3kcit@gmail.com> wrote:
On 08/02/2016 01:48 PM, Gael Varoquaux wrote:
* One relevant consequence is that, to add annotations on the code,
scikit-learn should depend on the "typing"[1] module which contains some of the basic names imported and used in annotations. It's a stdlib module in python 3.5, but the PyPI package backports it to python 2.7 and newer (I'm not sure how it works with Python 2.6, which might be an issue)
I am afraid that this is going to be a problem: we have a no dependency policy (beyond numpy and scipy).
I still think this is a point we should discuss further ;)
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hi, On Wed, Jul 27, 2016 at 10:08 PM, Andreas Mueller <t3kcit@gmail.com> wrote:
Hi Daniel. This hasn't been brought up before so there is no "official position". I am generally in favor, though I'm not sure how doable it is. We are generally pretty generous in accepting all kinds of inputs, and many of our options can have different types: (None, int, float, string, nd-array) is relatively common as a type for an option. As we still support 2.6, we would need to do comments or external files.
Given numpy has dropped support for 2.6, maybe it would be reasonable for scikit-learn to do the same, to make this process easier? Cheers, Matthew
On 07/28/2016 12:03 PM, Matthew Brett wrote:
Hi,
On Wed, Jul 27, 2016 at 10:08 PM, Andreas Mueller <t3kcit@gmail.com> wrote:
Hi Daniel. This hasn't been brought up before so there is no "official position". I am generally in favor, though I'm not sure how doable it is. We are generally pretty generous in accepting all kinds of inputs, and many of our options can have different types: (None, int, float, string, nd-array) is relatively common as a type for an option. As we still support 2.6, we would need to do comments or external files. Given numpy has dropped support for 2.6, maybe it would be reasonable for scikit-learn to do the same, to make this process easier?
How would it change the process? We have been discussing this. My stance is that we should drop it as soon as it creates a major nuisance, but not just for the sake of dropping it.
On Thu, Jul 28, 2016 at 5:10 PM, Andreas Mueller <t3kcit@gmail.com> wrote:
On 07/28/2016 12:03 PM, Matthew Brett wrote:
Hi,
On Wed, Jul 27, 2016 at 10:08 PM, Andreas Mueller <t3kcit@gmail.com> wrote:
Hi Daniel. This hasn't been brought up before so there is no "official position". I am generally in favor, though I'm not sure how doable it is. We are generally pretty generous in accepting all kinds of inputs, and many of our options can have different types: (None, int, float, string, nd-array) is relatively common as a type for an option. As we still support 2.6, we would need to do comments or external files.
Given numpy has dropped support for 2.6, maybe it would be reasonable for scikit-learn to do the same, to make this process easier?
How would it change the process? We have been discussing this. My stance is that we should drop it as soon as it creates a major nuisance, but not just for the sake of dropping it.
Ah - sorry - I misunderstood this:
As we still support 2.6, we would need to do comments or external files.
to mean 2.6 specifically, rather than 2.x. Cheers, Matthew
On Thu, Jul 28, 2016 at 12:10:03PM -0400, Andreas Mueller wrote:
Given numpy has dropped support for 2.6, maybe it would be reasonable for scikit-learn to do the same, to make this process easier?
How would it change the process? We have been discussing this. My stance is that we should drop it as soon as it creates a major nuisance, but not just for the sake of dropping it.
Same feeling here.
participants (9)
-
Alexandre Gramfort -
Andreas Mueller -
Daniel Moisset -
David Nicholson -
federico vaggi -
Gael Varoquaux -
Joel Nothman -
Matthew Brett -
Sebastian Raschka