[scikit-learn] [ANN] Scikit-learn 0.20.0

Nick Pentreath nick.pentreath at gmail.com
Wed Oct 3 15:32:03 EDT 2018


For ONNX you may be interested in https://github.com/onnx/onnxmltools -
which supports conversion of a few skelarn models to ONNX already.

However as far as I am aware, none of the ONNX backends actually support
the ONNX-ML extended spec (in open-source at least). So you would not be
able to actually do prediction I think...

As for PFA, to my current knowledge there is no library that does it yet.
Our own Aardpfark project (https://github.com/CODAIT/aardpfark) focuses on
SparkML export to PFA for now but would like to add sklearn support in the
future.


On Wed, 3 Oct 2018 at 20:07 Sebastian Raschka <mail at sebastianraschka.com>
wrote:

> The ONNX-approach sounds most promising, esp. because it will also allow
> library interoperability but I wonder if this is for parametric models only
> and not for the nonparametric ones like KNN, tree-based classifiers, etc.
>
> All-in-all I can definitely see the appeal for having a way to export
> sklearn estimators in a text-based format (e.g., via JSON), since it would
> make sharing code easier. This doesn't even have to be compatible with
> multiple sklearn versions. A typical use case would be to include these
> JSON exports as e.g., supplemental files of a research paper for other
> people to run the models etc. (here, one can just specify which sklearn
> version it would require; of course, one could also share pickle files, by
> I am personally always hesitant reg. running/trusting other people's pickle
> files).
>
> Unfortunately though, as Gael pointed out, this "feature" would be a huge
> burden for the devs, and it would probably also negatively impact the
> development of scikit-learn itself because it imposes another design
> constraint.
>
> However, I do think this sounds like an excellent case for a contrib
> project. Like scikit-export, scikit-serialize or sth like that.
>
> Best,
> Sebastian
>
>
>
> > On Oct 3, 2018, at 5:49 AM, Javier López <jlopez at ende.cc> wrote:
> >
> >
> > On Tue, Oct 2, 2018 at 5:07 PM Gael Varoquaux <
> gael.varoquaux at normalesup.org> wrote:
> > The reason that pickles are brittle and that sharing pickles is a bad
> > practice is that pickle use an implicitly defined data model, which is
> > defined via the internals of objects.
> >
> > Plus the fact that loading a pickle can execute arbitrary code, and
> there is no way to know
> > if any malicious code is in there in advance because the contents of the
> pickle cannot
> > be easily inspected without loading/executing it.
> >
> > So, the problems of pickle are not specific to pickle, but rather
> > intrinsic to any generic persistence code [*]. Writing persistence code
> that
> > does not fall in these problems is very costly in terms of developer time
> > and makes it harder to add new methods or improve existing one. I am not
> > excited about it.
> >
> > My "text-based serialization" suggestion was nowhere near as ambitious
> as that,
> > as I have already explained, and wasn't aiming at solving the versioning
> issues, but
> > rather at having something which is "about as good" as pickle but in a
> human-readable
> > format. I am not asking for a Turing-complete language to reproduce the
> prediction
> > function, but rather something simple in the spirit of the output
> produced by the gist code I linked above, just for the model families where
> it is reasonable:
> >
> > https://gist.github.com/jlopezpena/2cdd09c56afda5964990d5cf278bfd31
> >
> > The code I posted mostly works (specific cases of nested models need to
> be addressed
> > separately, as well as pipelines), and we have been using (a version of)
> it in production
> > for quite some time. But there are hackish aspects to it that we are not
> happy with,
> > such as the manual separation of init and fitted parameters by checking
> if the name ends with "_", having to infer class name and location using
> > "model.__class__.__name__" and "model.__module__", and the wacky use of
> "__import__".
> >
> > My suggestion was more along the lines of adding some metadata to
> sklearn estimators so
> > that a code in a similar style would be nicer to write; little things
> like having a `init_parameters` and `fit_parameters` properties that would
> return the lists of named parameters,
> > or a `model_info` method that would return data like sklearn version,
> class name and location, or a package level dictionary pointing at the
> estimator classes by a string name, like
> >
> > from sklearn.linear_models import LogisticRegression
> > estimator_classes = {"LogisticRegression": LogisticRegression, ...}
> >
> > so that one can load the appropriate class from the string description
> without calling __import__ or eval; that sort of stuff.
> >
> > I am aware this would not address the common complain of "prefect
> prediction reproducibility"
> > across versions, but I think we can all agree that this utopia of
> perfect reproducibility is not
> > feasible.
> >
> > And in the long, long run, I agree that PFA/onnx or whichever similar
> format that emerges, is
> > the way to go.
> >
> > J
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20181003/d755c493/attachment-0001.html>


More information about the scikit-learn mailing list