[scikit-learn] Is there any official position on PEP484/mypy?

Daniel Moisset dmoisset at machinalis.com
Thu Jul 28 11:55:17 EDT 2016


Hi Andreas, thanks for the reply,

I think many arguments may end up with open specs, but even specifying
"Union[None, int, float, string, np.ndarray]" might be useful. I actually
expect to be able to provide stricter types using generics (you can say
things like "this a list of floats" or "this is a classifier on float
features and str labels"). Not only the outputs are relevant, this is
useful to detect some silly mistakes like wrong numbers of arguments or
misspells in method types. but yes, result types can also detect errors
like "some_classifier.fit_transform(X).predict(X)" (which I've seen on
carelessly refactored code :) ). The good think of scikit-learn is that
most methods already have information about types in docstrings so it
should be easy for us to move forward and validate what we're doing (and
the docstrings ;-) )

Given that there's interest (or at least no opposition) on this I feel
inclined to create a fork and start adding annotations (the comment-based
ones) into the code (which is better suited for this scenario than creating
external type stubs). We're already putting work into it and would be more
than happy to turn it into a contribution if it works well (which is a
"real" if given that this is still an experimental terrain)

Best,
    D.


On Wed, Jul 27, 2016 at 10:08 PM, Andreas Mueller <t3kcit at gmail.com> wrote:

> Hi Daniel.
> This hasn't been brought up before so there is no "official position".
> I am generally in favor, though I'm not sure how doable it is.
> We are generally pretty generous in accepting all kinds of inputs, and
> many of our options can have different types: (None, int, float, string,
> nd-array) is relatively common as a type for an option.
> As we still support 2.6, we would need to do comments or external files.
>
> As a user, you are probably most interested in the outputs, right? The
> types returned by scikit-learn could probably be auto-generated.
>
> I'm curious to see what others think.
> I'd be surprised if anyone is willing to invest a large amount of time on
> this, though if you guys want to contribute,
> we might be able to work something out.
>
> Andy
>
>
>
> On 07/27/2016 03:17 PM, Daniel Moisset wrote:
>
> Hi,
>
> [If you're also on the numpy mailing list and get a similar version of the
> message, I apologise for that]
>
> I work at Machinalis were we use a lot of scikit-learn (and the pydata
> stack in general). Recently we've also been getting involved with mypy,
> which is a tool to type check (not on runtime, think of it as a linter)
> annotated python code (the way of annotating python types has been recently
> standarized in PEP 484).
>
> As part of that involvement we've started creating type annotations for
> the Python libraries we use most, which include both numpy and
> scikit-learn. Mypy provides a way to specify types with annotations in
> separate files in case you don't have control over a library, so we have
> created an initial proof of concept for numpy at [1], and we are actively
> improving it. You can find some additional information about it and some
> problems we've found on the way at this blogpost [2]. We were planning to
> also start some work on scikit-learn (which has a much larger surface area
> than numpy, so probably focusing on small parts for now); we had to start
> with numpy anyway given that SKL depends on it.
>
> What I wanted to ask is if the people involved on the SKL project are
> aware of PEP484 annotations and if you have some interest in starting using
> them. The main benefit is that annotations serve as clear (and
> automatically testable) documentation for users, and secondary benefits is
> that users discovers bugs more quickly and that some IDEs (like pycharm)
> are starting to use this information for smart editor features
> (autocompletion, online checking, refactoring tools); eventually tools like
> jupyter could take advantage of these annotations in the future. And the
> cost of writing and including these are relatively low.
>
> We're doing the work anyway, but contributing our typespecs back could
> make it easier for users to benefit from this, and for us to maintain it
> and keep it in sync with future releases.
>
> If you've never heard about PEP484 or mypy (it happens a lot) I'll be
> happy to clarify anything about it that might helpunderstand this situation
>
> Thanks!
>
> D.
>
>
> [1] https://github.com/machinalis/mypy-data
> [2] http://www.machinalis.com/blog/writing-type-stubs-for-numpy/
>
> --
> Daniel F. Moisset - UK Country Manager
> www.machinalis.com
> Skype: @dmoisset
>
>
> _______________________________________________
> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>


-- 
Daniel F. Moisset - UK Country Manager
www.machinalis.com
Skype: @dmoisset
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160728/e091268e/attachment.html>


More information about the scikit-learn mailing list