[scikit-learn] [ANN] Scikit-learn 0.20.0

Thu Oct 4 11:40:02 EDT 2018


On 10/03/2018 03:32 PM, Nick Pentreath wrote:
> For ONNX you may be interested in 
> https://github.com/onnx/onnxmltools - which supports conversion of a 
> few skelarn models to ONNX already.
>
> However as far as I am aware, none of the ONNX backends actually 
> support the ONNX-ML extended spec (in open-source at least). So you 
> would not be able to actually do prediction I think...
Exactly, that's what I'm waiting for. MS is working on itafaik.

>
> As for PFA, to my current knowledge there is no library that does it 
> yet. Our own Aardpfark project 
> (https://github.com/CODAIT/aardpfark) focuses on SparkML export to PFA 
> for now but would like to add sklearn support in the future.
>
>
> On Wed, 3 Oct 2018 at 20:07 Sebastian Raschka 
> <mail at sebastianraschka.com <mailto:mail at sebastianraschka.com>> wrote:
>
>     The ONNX-approach sounds most promising, esp. because it will also
>     allow library interoperability but I wonder if this is for
>     parametric models only and not for the nonparametric ones like
>     KNN, tree-based classifiers, etc.
>
>     All-in-all I can definitely see the appeal for having a way to
>     export sklearn estimators in a text-based format (e.g., via JSON),
>     since it would make sharing code easier. This doesn't even have to
>     be compatible with multiple sklearn versions. A typical use case
>     would be to include these JSON exports as e.g., supplemental files
>     of a research paper for other people to run the models etc. (here,
>     one can just specify which sklearn version it would require; of
>     course, one could also share pickle files, by I am personally
>     always hesitant reg. running/trusting other people's pickle files).
>
>     Unfortunately though, as Gael pointed out, this "feature" would be
>     a huge burden for the devs, and it would probably also negatively
>     impact the development of scikit-learn itself because it imposes
>     another design constraint.
>
>     However, I do think this sounds like an excellent case for a
>     contrib project. Like scikit-export, scikit-serialize or sth like
>     that.
>
>     Best,
>     Sebastian
>
>
>
>     > On Oct 3, 2018, at 5:49 AM, Javier López <jlopez at ende.cc> wrote:
>     >
>     >
>     > On Tue, Oct 2, 2018 at 5:07 PM Gael Varoquaux
>     <gael.varoquaux at normalesup.org
>     <mailto:gael.varoquaux at normalesup.org>> wrote:
>     > The reason that pickles are brittle and that sharing pickles is
>     a bad
>     > practice is that pickle use an implicitly defined data model,
>     which is
>     > defined via the internals of objects.
>     >
>     > Plus the fact that loading a pickle can execute arbitrary code,
>     and there is no way to know
>     > if any malicious code is in there in advance because the
>     contents of the pickle cannot
>     > be easily inspected without loading/executing it.
>     >
>     > So, the problems of pickle are not specific to pickle, but rather
>     > intrinsic to any generic persistence code [*]. Writing
>     persistence code that
>     > does not fall in these problems is very costly in terms of
>     developer time
>     > and makes it harder to add new methods or improve existing one.
>     I am not
>     > excited about it.
>     >
>     > My "text-based serialization" suggestion was nowhere near as
>     ambitious as that,
>     > as I have already explained, and wasn't aiming at solving the
>     versioning issues, but
>     > rather at having something which is "about as good" as pickle
>     but in a human-readable
>     > format. I am not asking for a Turing-complete language to
>     reproduce the prediction
>     > function, but rather something simple in the spirit of the
>     output produced by the gist code I linked above, just for the
>     model families where it is reasonable:
>     >
>     > https://gist.github.com/jlopezpena/2cdd09c56afda5964990d5cf278bfd31
>     >
>     > The code I posted mostly works (specific cases of nested models
>     need to be addressed
>     > separately, as well as pipelines), and we have been using (a
>     version of) it in production
>     > for quite some time. But there are hackish aspects to it that we
>     are not happy with,
>     > such as the manual separation of init and fitted parameters by
>     checking if the name ends with "_", having to infer class name and
>     location using
>     > "model.__class__.__name__" and "model.__module__", and the wacky
>     use of "__import__".
>     >
>     > My suggestion was more along the lines of adding some metadata
>     to sklearn estimators so
>     > that a code in a similar style would be nicer to write; little
>     things like having a `init_parameters` and `fit_parameters`
>     properties that would return the lists of named parameters,
>     > or a `model_info` method that would return data like sklearn
>     version, class name and location, or a package level dictionary
>     pointing at the estimator classes by a string name, like
>     >
>     > from sklearn.linear_models import LogisticRegression
>     > estimator_classes = {"LogisticRegression": LogisticRegression, ...}
>     >
>     > so that one can load the appropriate class from the string
>     description without calling __import__ or eval; that sort of stuff.
>     >
>     > I am aware this would not address the common complain of
>     "prefect prediction reproducibility"
>     > across versions, but I think we can all agree that this utopia
>     of perfect reproducibility is not
>     > feasible.
>     >
>     > And in the long, long run, I agree that PFA/onnx or whichever
>     similar format that emerges, is
>     > the way to go.
>     >
>     > J
>     > _______________________________________________
>     > scikit-learn mailing list
>     > scikit-learn at python.org <mailto:scikit-learn at python.org>
>     > https://mail.python.org/mailman/listinfo/scikit-learn
>
>     _______________________________________________
>     scikit-learn mailing list
>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>     https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20181004/24e6a790/attachment.html>