[scikit-learn] [ANN] Scikit-learn 0.20.0

Javier López jlopez at ende.cc
Thu Sep 27 19:22:07 EDT 2018


First of all, congratulations on the release, great work, everyone!

I think model serialization should be a priority. Particularly,
I think that (whenever practical) there should be a way of
serializing estimators (either unfitted or fitted) in a text-readable
format,
prefereably JSON or PMML/PFA (or several others).

Obviously for some models it is not practical (eg random forests with
thousands of trees), but for simpler situations I believe it would
provide a great tool for model sharing without the dangers of pickling
and the versioning hell.

I am (painfully) aware that when rebuilding a model on a different setup,
it might yield different results; in my company we address that by saving
together with the serialized model a reasonably small validation dataset
together with its predictions, upon unserializing we check that the rebuilt
model reproduces the predictions within some acceptable range.

About the new release, I am particularly happy about the joblib update,
as it has been a major source of pain for me over the last year. On that
note, I think it would be a good idea to stop vendoring joblib and list it
as
a dependency instead; wheels, pip and conda are mature enough to
handle the situation nowadays.

Last, but not least, it would be great to relax the checks concerning nans
at prediction time, and allow, for instance, that an estimator yields nans
if
any features are nan's; we face that situation when working with ensembles,
where a few of the submodels might not get enough features available, but
the rest do.

Of the top of my head, that's all, keep up the fantastic work!
J

On Thu, Sep 27, 2018 at 6:31 PM Andreas Mueller <t3kcit at gmail.com> wrote:

> I think we should work on the formatting, make sure it's complete, link it
> to issues /PRs and
> then make this into a public document on the website and request feedback.
>
> Right now it's a bit in a format that is understandable for
> core-developers but some of the things are not clear
> to the average audience. Linking the issues / PRs will help that a bit,
> but also we might want to add a sentence
> to each point in the roadmap.
>
> I had some issues with the formatting, I'll try to fix that later.
> Any volunteers for adding the frozen estimator (or has someone added that
> already?).
>
> Cheers,
> Andy
>
>
> On 09/27/2018 04:29 AM, Olivier Grisel wrote:
>
> Le mer. 26 sept. 2018 à 23:02, Joel Nothman <joel.nothman at gmail.com> a
> écrit :
>
>> And for those interested in what's in the pipeline, we are trying to
>> draft a roadmap...
>> https://github.com/scikit-learn/scikit-learn/wiki/Draft-Roadmap-2018
>>
>> But there are no doubt many features that are absent there too.
>>
>
> Indeed, it would be great to get some feedback on this roadmap from heavy
> scikit-learn users: which points do you think are the most important? What
> is missing from this roadmap?
>
> Feel free to reply to this thread.
>
> --
> Olivier
>
>
> _______________________________________________
> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180928/ee5c7885/attachment.html>


More information about the scikit-learn mailing list