[scikit-learn] [ANN] Scikit-learn 0.20.0
Andreas Mueller
t3kcit at gmail.com
Thu Oct 4 11:40:02 EDT 2018
On 10/03/2018 03:32 PM, Nick Pentreath wrote:
> For ONNX you may be interested in
> https://github.com/onnx/onnxmltools - which supports conversion of a
> few skelarn models to ONNX already.
>
> However as far as I am aware, none of the ONNX backends actually
> support the ONNX-ML extended spec (in open-source at least). So you
> would not be able to actually do prediction I think...
Exactly, that's what I'm waiting for. MS is working on itafaik.
>
> As for PFA, to my current knowledge there is no library that does it
> yet. Our own Aardpfark project
> (https://github.com/CODAIT/aardpfark) focuses on SparkML export to PFA
> for now but would like to add sklearn support in the future.
>
>
> On Wed, 3 Oct 2018 at 20:07 Sebastian Raschka
> <mail at sebastianraschka.com <mailto:mail at sebastianraschka.com>> wrote:
>
> The ONNX-approach sounds most promising, esp. because it will also
> allow library interoperability but I wonder if this is for
> parametric models only and not for the nonparametric ones like
> KNN, tree-based classifiers, etc.
>
> All-in-all I can definitely see the appeal for having a way to
> export sklearn estimators in a text-based format (e.g., via JSON),
> since it would make sharing code easier. This doesn't even have to
> be compatible with multiple sklearn versions. A typical use case
> would be to include these JSON exports as e.g., supplemental files
> of a research paper for other people to run the models etc. (here,
> one can just specify which sklearn version it would require; of
> course, one could also share pickle files, by I am personally
> always hesitant reg. running/trusting other people's pickle files).
>
> Unfortunately though, as Gael pointed out, this "feature" would be
> a huge burden for the devs, and it would probably also negatively
> impact the development of scikit-learn itself because it imposes
> another design constraint.
>
> However, I do think this sounds like an excellent case for a
> contrib project. Like scikit-export, scikit-serialize or sth like
> that.
>
> Best,
> Sebastian
>
>
>
> > On Oct 3, 2018, at 5:49 AM, Javier López <jlopez at ende.cc> wrote:
> >
> >
> > On Tue, Oct 2, 2018 at 5:07 PM Gael Varoquaux
> <gael.varoquaux at normalesup.org
> <mailto:gael.varoquaux at normalesup.org>> wrote:
> > The reason that pickles are brittle and that sharing pickles is
> a bad
> > practice is that pickle use an implicitly defined data model,
> which is
> > defined via the internals of objects.
> >
> > Plus the fact that loading a pickle can execute arbitrary code,
> and there is no way to know
> > if any malicious code is in there in advance because the
> contents of the pickle cannot
> > be easily inspected without loading/executing it.
> >
> > So, the problems of pickle are not specific to pickle, but rather
> > intrinsic to any generic persistence code [*]. Writing
> persistence code that
> > does not fall in these problems is very costly in terms of
> developer time
> > and makes it harder to add new methods or improve existing one.
> I am not
> > excited about it.
> >
> > My "text-based serialization" suggestion was nowhere near as
> ambitious as that,
> > as I have already explained, and wasn't aiming at solving the
> versioning issues, but
> > rather at having something which is "about as good" as pickle
> but in a human-readable
> > format. I am not asking for a Turing-complete language to
> reproduce the prediction
> > function, but rather something simple in the spirit of the
> output produced by the gist code I linked above, just for the
> model families where it is reasonable:
> >
> > https://gist.github.com/jlopezpena/2cdd09c56afda5964990d5cf278bfd31
> >
> > The code I posted mostly works (specific cases of nested models
> need to be addressed
> > separately, as well as pipelines), and we have been using (a
> version of) it in production
> > for quite some time. But there are hackish aspects to it that we
> are not happy with,
> > such as the manual separation of init and fitted parameters by
> checking if the name ends with "_", having to infer class name and
> location using
> > "model.__class__.__name__" and "model.__module__", and the wacky
> use of "__import__".
> >
> > My suggestion was more along the lines of adding some metadata
> to sklearn estimators so
> > that a code in a similar style would be nicer to write; little
> things like having a `init_parameters` and `fit_parameters`
> properties that would return the lists of named parameters,
> > or a `model_info` method that would return data like sklearn
> version, class name and location, or a package level dictionary
> pointing at the estimator classes by a string name, like
> >
> > from sklearn.linear_models import LogisticRegression
> > estimator_classes = {"LogisticRegression": LogisticRegression, ...}
> >
> > so that one can load the appropriate class from the string
> description without calling __import__ or eval; that sort of stuff.
> >
> > I am aware this would not address the common complain of
> "prefect prediction reproducibility"
> > across versions, but I think we can all agree that this utopia
> of perfect reproducibility is not
> > feasible.
> >
> > And in the long, long run, I agree that PFA/onnx or whichever
> similar format that emerges, is
> > the way to go.
> >
> > J
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org <mailto:scikit-learn at python.org>
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org <mailto:scikit-learn at python.org>
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20181004/24e6a790/attachment.html>
More information about the scikit-learn
mailing list