Hey everbody! I'm happy to (finally) announce scikit-learn 0.20.0. This release is dedicated to the memory of Raghav Rajagopalan. You can upgrade now with pip or conda! There is many important additions and updates, and you can find the full release notes here: http://scikit-learn.org/stable/whats_new.html#version-0-20 My personal highlights are the ColumnTransformer and the changes to OneHotEncoder, but there's so much more! An important note is that this is the last version to support Python2.7, and the next release will require Python 3.5. A big thank you to everybody who contributed and special thanks to Joel! All the best, Andy
Congratulations ! Bertrand On 26/09/2018 20:55, Andreas Mueller wrote:
Hey everbody! I'm happy to (finally) announce scikit-learn 0.20.0. This release is dedicated to the memory of Raghav Rajagopalan.
You can upgrade now with pip or conda!
There is many important additions and updates, and you can find the full release notes here: http://scikit-learn.org/stable/whats_new.html#version-0-20
My personal highlights are the ColumnTransformer and the changes to OneHotEncoder, but there's so much more!
An important note is that this is the last version to support Python2.7, and the next release will require Python 3.5.
A big thank you to everybody who contributed and special thanks to Joel!
All the best, Andy _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Congratulations! Thank you very much for everyone's hard work! Raga On Wed, Sep 26, 2018, 2:57 PM Andreas Mueller <t3kcit@gmail.com> wrote:
Hey everbody! I'm happy to (finally) announce scikit-learn 0.20.0. This release is dedicated to the memory of Raghav Rajagopalan.
You can upgrade now with pip or conda!
There is many important additions and updates, and you can find the full release notes here: http://scikit-learn.org/stable/whats_new.html#version-0-20
My personal highlights are the ColumnTransformer and the changes to OneHotEncoder, but there's so much more!
An important note is that this is the last version to support Python2.7, and the next release will require Python 3.5.
A big thank you to everybody who contributed and special thanks to Joel!
All the best, Andy _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Wow. It's finally out!! Thank you to the cast of thousands, but to also some individuals for real dedication and insight! Yet there's so much more still in the pipeline. If we're clever about things, we'll make the next release cycle shorter and the release more manageable.
On 09/26/2018 04:49 PM, Joel Nothman wrote:
Wow. It's finally out!! Thank you to the cast of thousands, but to also some individuals for real dedication and insight!
Yet there's so much more still in the pipeline. If we're clever about things, we'll make the next release cycle shorter and the release more manageable.
There's always so much more :) And yes, we should strive to cut down our release cycle (significantly). Let's see if we manage.
And for those interested in what's in the pipeline, we are trying to draft a roadmap... https://github.com/scikit-learn/scikit-learn/wiki/Draft-Roadmap-2018 But there are no doubt many features that are absent there too.
Le mer. 26 sept. 2018 à 23:02, Joel Nothman <joel.nothman@gmail.com> a écrit :
And for those interested in what's in the pipeline, we are trying to draft a roadmap... https://github.com/scikit-learn/scikit-learn/wiki/Draft-Roadmap-2018
But there are no doubt many features that are absent there too.
Indeed, it would be great to get some feedback on this roadmap from heavy scikit-learn users: which points do you think are the most important? What is missing from this roadmap? Feel free to reply to this thread. -- Olivier
I think we should work on the formatting, make sure it's complete, link it to issues /PRs and then make this into a public document on the website and request feedback. Right now it's a bit in a format that is understandable for core-developers but some of the things are not clear to the average audience. Linking the issues / PRs will help that a bit, but also we might want to add a sentence to each point in the roadmap. I had some issues with the formatting, I'll try to fix that later. Any volunteers for adding the frozen estimator (or has someone added that already?). Cheers, Andy On 09/27/2018 04:29 AM, Olivier Grisel wrote:
Le mer. 26 sept. 2018 à 23:02, Joel Nothman <joel.nothman@gmail.com <mailto:joel.nothman@gmail.com>> a écrit :
And for those interested in what's in the pipeline, we are trying to draft a roadmap... https://github.com/scikit-learn/scikit-learn/wiki/Draft-Roadmap-2018
But there are no doubt many features that are absent there too.
Indeed, it would be great to get some feedback on this roadmap from heavy scikit-learn users: which points do you think are the most important? What is missing from this roadmap?
Feel free to reply to this thread.
-- Olivier
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
First of all, congratulations on the release, great work, everyone! I think model serialization should be a priority. Particularly, I think that (whenever practical) there should be a way of serializing estimators (either unfitted or fitted) in a text-readable format, prefereably JSON or PMML/PFA (or several others). Obviously for some models it is not practical (eg random forests with thousands of trees), but for simpler situations I believe it would provide a great tool for model sharing without the dangers of pickling and the versioning hell. I am (painfully) aware that when rebuilding a model on a different setup, it might yield different results; in my company we address that by saving together with the serialized model a reasonably small validation dataset together with its predictions, upon unserializing we check that the rebuilt model reproduces the predictions within some acceptable range. About the new release, I am particularly happy about the joblib update, as it has been a major source of pain for me over the last year. On that note, I think it would be a good idea to stop vendoring joblib and list it as a dependency instead; wheels, pip and conda are mature enough to handle the situation nowadays. Last, but not least, it would be great to relax the checks concerning nans at prediction time, and allow, for instance, that an estimator yields nans if any features are nan's; we face that situation when working with ensembles, where a few of the submodels might not get enough features available, but the rest do. Of the top of my head, that's all, keep up the fantastic work! J On Thu, Sep 27, 2018 at 6:31 PM Andreas Mueller <t3kcit@gmail.com> wrote:
I think we should work on the formatting, make sure it's complete, link it to issues /PRs and then make this into a public document on the website and request feedback.
Right now it's a bit in a format that is understandable for core-developers but some of the things are not clear to the average audience. Linking the issues / PRs will help that a bit, but also we might want to add a sentence to each point in the roadmap.
I had some issues with the formatting, I'll try to fix that later. Any volunteers for adding the frozen estimator (or has someone added that already?).
Cheers, Andy
On 09/27/2018 04:29 AM, Olivier Grisel wrote:
Le mer. 26 sept. 2018 à 23:02, Joel Nothman <joel.nothman@gmail.com> a écrit :
And for those interested in what's in the pipeline, we are trying to draft a roadmap... https://github.com/scikit-learn/scikit-learn/wiki/Draft-Roadmap-2018
But there are no doubt many features that are absent there too.
Indeed, it would be great to get some feedback on this roadmap from heavy scikit-learn users: which points do you think are the most important? What is missing from this roadmap?
Feel free to reply to this thread.
-- Olivier
_______________________________________________ scikit-learn mailing listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Congrats everyone, this is awesome!!! I just started teaching an ML course this semester and introduced scikit-learn this week -- it was a great timing to demonstrate how well maintained the library is and praise all the efforts that go into it :).
I think model serialization should be a priority.
While this could potentially a bit inefficient for large non-parametric models, I think the serialization into a text-readable format has some advantages for real-world use cases. E.g., sharing models (pickle is a bit problematic because of security issues) in applications but also as supplementary material in archives for accompanying research articles, etc (esp in cases where datasets cannot be shared in their original form due to some copyright or other concerns). Chris Emmery, Chris Wagner and I toyed around with JSON a while back (https://cmry.github.io/notes/serialize), and it could be feasible -- but yeah, it will involve some work, especially with testing things thoroughly for all kinds of estimators. Maybe this could somehow be automated though in a grid-search kind of way with a build matrix for estimators and parameters once a general framework has been developed.
On Sep 27, 2018, at 6:22 PM, Javier López <jlopez@ende.cc> wrote:
First of all, congratulations on the release, great work, everyone!
I think model serialization should be a priority. Particularly, I think that (whenever practical) there should be a way of serializing estimators (either unfitted or fitted) in a text-readable format, prefereably JSON or PMML/PFA (or several others).
Obviously for some models it is not practical (eg random forests with thousands of trees), but for simpler situations I believe it would provide a great tool for model sharing without the dangers of pickling and the versioning hell.
I am (painfully) aware that when rebuilding a model on a different setup, it might yield different results; in my company we address that by saving together with the serialized model a reasonably small validation dataset together with its predictions, upon unserializing we check that the rebuilt model reproduces the predictions within some acceptable range.
About the new release, I am particularly happy about the joblib update, as it has been a major source of pain for me over the last year. On that note, I think it would be a good idea to stop vendoring joblib and list it as a dependency instead; wheels, pip and conda are mature enough to handle the situation nowadays.
Last, but not least, it would be great to relax the checks concerning nans at prediction time, and allow, for instance, that an estimator yields nans if any features are nan's; we face that situation when working with ensembles, where a few of the submodels might not get enough features available, but the rest do.
Of the top of my head, that's all, keep up the fantastic work! J
On Thu, Sep 27, 2018 at 6:31 PM Andreas Mueller <t3kcit@gmail.com> wrote: I think we should work on the formatting, make sure it's complete, link it to issues /PRs and then make this into a public document on the website and request feedback.
Right now it's a bit in a format that is understandable for core-developers but some of the things are not clear to the average audience. Linking the issues / PRs will help that a bit, but also we might want to add a sentence to each point in the roadmap.
I had some issues with the formatting, I'll try to fix that later. Any volunteers for adding the frozen estimator (or has someone added that already?).
Cheers, Andy
On 09/27/2018 04:29 AM, Olivier Grisel wrote:
Le mer. 26 sept. 2018 à 23:02, Joel Nothman <joel.nothman@gmail.com> a écrit : And for those interested in what's in the pipeline, we are trying to draft a roadmap... https://github.com/scikit-learn/scikit-learn/wiki/Draft-Roadmap-2018
But there are no doubt many features that are absent there too.
Indeed, it would be great to get some feedback on this roadmap from heavy scikit-learn users: which points do you think are the most important? What is missing from this roadmap?
Feel free to reply to this thread.
-- Olivier
_______________________________________________ scikit-learn mailing list
scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
On Fri, Sep 28, 2018 at 1:03 AM Sebastian Raschka <mail@sebastianraschka.com> wrote:
Chris Emmery, Chris Wagner and I toyed around with JSON a while back ( https://cmry.github.io/notes/serialize), and it could be feasible
I came across your notes a while back, they were really useful! I hacked a variation of it that didn't need to know the model class in advance: https://gist.github.com/jlopezpena/2cdd09c56afda5964990d5cf278bfd31 but is is VERY hackish, and it doesn't work with complex models with nested components. (At work we use a further variation of this that also works on pipelines and some specific nested stuff, like `mlxtend`'s `SequentialFeatureSelector`)
but yeah, it will involve some work, especially with testing things thoroughly for all kinds of estimators. Maybe this could somehow be automated though in a grid-search kind of way with a build matrix for estimators and parameters once a general framework has been developed.
I considered making this serialization into an external project, but I think this would be much easier if estimators provided a dunder method `__serialize__` (or whatever) that would handle the idiosyncrasies of each particular family, I don't believe there will be a "one-size-fits-all" solution for this problem. This approach would also make it possible to work on it incrementally, raising a default `NotImplementedError` for estimators that haven't been addressed yet. In the long run, I also believe that the "proper" way to do this is to allow dumping entire processes into PFA: http://dmg.org/pfa/docs/motivation/
I think model serialization should be a priority.
There is also the ONNX specification that is gaining industrial adoption and that already includes open source exporters for several families of scikit-learn models: https://github.com/onnx/onnxmltools -- Olivier
I think model serialization should be a priority.
There is also the ONNX specification that is gaining industrial adoption and that already includes open source exporters for several families of scikit-learn models:
Didn't know about that. This is really nice! What do you think about referring to it under http://scikit-learn.org/stable/modules/model_persistence.html to make people aware that this option exists? Would be happy to add a PR. Best, Sebastian
On Sep 28, 2018, at 9:30 AM, Olivier Grisel <olivier.grisel@ensta.org> wrote:
I think model serialization should be a priority.
There is also the ONNX specification that is gaining industrial adoption and that already includes open source exporters for several families of scikit-learn models:
https://github.com/onnx/onnxmltools
-- Olivier _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
On 09/28/2018 12:10 PM, Sebastian Raschka wrote:
I think model serialization should be a priority. There is also the ONNX specification that is gaining industrial adoption and that already includes open source exporters for several families of scikit-learn models:
Didn't know about that. This is really nice! What do you think about referring to it under http://scikit-learn.org/stable/modules/model_persistence.html to make people aware that this option exists? Would be happy to add a PR.
I don't think an open source runtime has been announced yet (or they didn't email me like they promised lol). I'm quite excited about this as well. Javier: The problem is not so much storing the "model" but storing how to make predictions. Different versions could act differently on the same data structure - and the data structure could change. Both happen in scikit-learn. So if you want to make sure the right thing happens across versions, you either need to provide serialization and deserialization for every version and conversion between those or you need to provide a way to store the prediction function, which basically means you need a turing-complete language (that's what ONNX does). We basically said doing the first is not feasible within scikit-learn given our current amount of resources, and no-one has even tried doing it outside of scikit-learn (which would be possible). Implementing a complete prediction serialization language (the second option) is definitely outside the scope of sklearn.
On 09/28/2018 01:38 PM, Andreas Mueller wrote:
On 09/28/2018 12:10 PM, Sebastian Raschka wrote:
I think model serialization should be a priority. There is also the ONNX specification that is gaining industrial adoption and that already includes open source exporters for several families of scikit-learn models:
Didn't know about that. This is really nice! What do you think about referring to it under http://scikit-learn.org/stable/modules/model_persistence.html to make people aware that this option exists? Would be happy to add a PR.
I don't think an open source runtime has been announced yet (or they didn't email me like they promised lol). I'm quite excited about this as well.
Javier: The problem is not so much storing the "model" but storing how to make predictions. Different versions could act differently on the same data structure - and the data structure could change. Both happen in scikit-learn. So if you want to make sure the right thing happens across versions, you either need to provide serialization and deserialization for every version and conversion between those or you need to provide a way to store the prediction function, which basically means you need a turing-complete language (that's what ONNX does).
We basically said doing the first is not feasible within scikit-learn given our current amount of resources, and no-one has even tried doing it outside of scikit-learn (which would be possible). Implementing a complete prediction serialization language (the second option) is definitely outside the scope of sklearn.
Maybe we should add to the FAQ why serialization is hard?
How about a docker based approach? Just thinking out loud Best Manuel El vie., 28 sept. 2018 19:43, Andreas Mueller <t3kcit@gmail.com> escribió:
On 09/28/2018 01:38 PM, Andreas Mueller wrote:
On 09/28/2018 12:10 PM, Sebastian Raschka wrote:
I think model serialization should be a priority. There is also the ONNX specification that is gaining industrial adoption and that already includes open source exporters for several families of scikit-learn models:
Didn't know about that. This is really nice! What do you think about referring to it under http://scikit-learn.org/stable/modules/model_persistence.html to make people aware that this option exists? Would be happy to add a PR.
I don't think an open source runtime has been announced yet (or they didn't email me like they promised lol). I'm quite excited about this as well.
Javier: The problem is not so much storing the "model" but storing how to make predictions. Different versions could act differently on the same data structure - and the data structure could change. Both happen in scikit-learn. So if you want to make sure the right thing happens across versions, you either need to provide serialization and deserialization for every version and conversion between those or you need to provide a way to store the prediction function, which basically means you need a turing-complete language (that's what ONNX does).
We basically said doing the first is not feasible within scikit-learn given our current amount of resources, and no-one has even tried doing it outside of scikit-learn (which would be possible). Implementing a complete prediction serialization language (the second option) is definitely outside the scope of sklearn.
Maybe we should add to the FAQ why serialization is hard? _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
On Fri, Sep 28, 2018 at 6:41 PM Andreas Mueller <t3kcit@gmail.com> wrote:
Javier: The problem is not so much storing the "model" but storing how to make predictions. Different versions could act differently on the same data structure - and the data structure could change. Both happen in scikit-learn. So if you want to make sure the right thing happens across versions, you either need to provide serialization and deserialization for every version and conversion between those or you need to provide a way to store the prediction function, which basically means you need a turing-complete language (that's what ONNX does).
I understand the difficulty of the situation, but an approximate solution to that is saving the predictions from a large enough validation set. If the prediction for the newly created model are "close enough" to the old ones, we deem the unserialized model to be the same and move forward, if there are serious discrepancies, then we dive deep to see what's going on, and if needed refit the offending submodels with the newer version. Since we only want to compare the predictions here, we don't need a ground truth and thus the validation set doesn't even need to be a real dataset, it can consist of synthetic datapoints created via SMOTE, Caruana's MUNGE algorithm, or any other method, and can be made arbitrarily large on in advance. This method has worked reasonably well for us in practice; we deal with ensembles containing hundreds or thousands of models, and this technique saves us from having to refit many of them that don't change very often, and if something changes a lot, we want to know in either case to ascertain what was amiss (either with the old version or with the new one). The situation I am proposing is not worse than what we have right now, which is save a pickle and then hope that it can be read later on; sometimes it can, sometimes it cannot depending on what changed. Stuff unrelated to the models themselves, such as changes in the joblib dump method broke several of our pickles files in the past. What I would like to have is a text-based representation of the fitted model that can always be read, stored in a database, or sent over the wire through simple methods. J
On 09/28/2018 03:20 PM, Javier López wrote:
I understand the difficulty of the situation, but an approximate solution to that is saving the predictions from a large enough validation set. If the prediction for the newly created model are "close enough" to the old ones, we deem the unserialized model to be the same and move forward, if there are serious discrepancies, then we dive deep to see what's going on, and if needed refit the offending submodels with the newer version.
Basically what you're saying is that you're fine with versioning the models and having the model break loudly if anything changes. That's not actually what most people want. They want to be able to make predictions with a given model for ever into the future. Your use-case is similar, but if retraining the model is not an issue, why don't you want to retrain every time scikit-learn releases a new version? We're now storing the version of scikit-learn that was used in the pickle and warn if you're trying to load with a different version. That's basically a stricter test than what you wanted. Yes, there are false positives, but given that this release took a year, this doesn't seem that big an issue?
On Fri, Sep 28, 2018 at 8:46 PM Andreas Mueller <t3kcit@gmail.com> wrote:
Basically what you're saying is that you're fine with versioning the models and having the model break loudly if anything changes. That's not actually what most people want. They want to be able to make predictions with a given model for ever into the future.
Are we talking about "(the new version of) the old model can still make predictions" or "the old model makes exactly the same predictions as before"? I'd like the first to hold, don't care that much about the second.
Your use-case is similar, but if retraining the model is not an issue, why don't you want to retrain every time scikit-learn releases a new version?
Thousands of models. I don't want to retrain ALL of them unless needed
We're now storing the version of scikit-learn that was used in the pickle and warn if you're trying to load with a different version.
This is not the whole truth. Yes, you store the sklearn version on the pickle and raise a warning; I am mostly ok with that, but the pickles are brittle and oftentimes they stop loading when other versions of other stuff change. I am not talking about "Warning: wrong version", but rather "Unpickling error: expected bytes, found tuple" that prevent the file from loading entirely.
That's basically a stricter test than what you wanted. Yes, there are false positives, but given that this release took a year, this doesn't seem that big an issue?
1. Things in the current state break when something else changes, not only sklearn. 2. Sharing pickles is a bad practice due to a number of reasons. 3. We might want to explore model parameters without having to load the entire runtime Also, in order to retrain the model we need to keep the whole model description with parameters. This needs to be saved somewhere, which in the current state would force us to keep two files: one with the parameters (in a text format to avoid the "non-loadng" problems from above) and the pkl with the fitted model. My proposal would keep both in a single file. As mentioned in previous emails, we already have our own solution that kind-of-works for our needs, but we have to do a few hackish things to keep things running. If sklearn estimators simply included a text serialization method (similar in spirit to the one used for __display__ or __repr__) it would make things easier. But I understand that not everyone's needs are the same, so if you guys don't consider this type of thing a priority, we can live with that :) I mostly mentioned it since "Backwards-compatible de/serialization of some estimators" is listed in the roadmap as a desirable goal for version 1.0 and feedback on such roadmap was requested. J
On 09/28/2018 04:45 PM, Javier López wrote:
On Fri, Sep 28, 2018 at 8:46 PM Andreas Mueller <t3kcit@gmail.com <mailto:t3kcit@gmail.com>> wrote:
Basically what you're saying is that you're fine with versioning the models and having the model break loudly if anything changes. That's not actually what most people want. They want to be able to make predictions with a given model for ever into the future.
Are we talking about "(the new version of) the old model can still make predictions" or "the old model makes exactly the same predictions as before"? I'd like the first to hold, don't care that much about the second.
The second.
We're now storing the version of scikit-learn that was used in the pickle and warn if you're trying to load with a different version.
This is not the whole truth. Yes, you store the sklearn version on the pickle and raise a warning; I am mostly ok with that, but the pickles are brittle and oftentimes they stop loading when other versions of other stuff change. I am not talking about "Warning: wrong version", but rather "Unpickling error: expected bytes, found tuple" that prevent the file from loading entirely.
Can you give examples of that? That shouldn't really happen afaik.
That's basically a stricter test than what you wanted. Yes, there are false positives, but given that this release took a year, this doesn't seem that big an issue?
1. Things in the current state break when something else changes, not only sklearn. 2. Sharing pickles is a bad practice due to a number of reasons. 3. We might want to explore model parameters without having to load the entire runtime
I agree, it would be great to have something other than pickle, but as I said, the usual request is "I want a way for a model to make the same predictions in the future". If you have a way to do that with a text-based format that doesn't require writing lots of version converters I'd be very happy. Generally, what you want is not to store the model but to store the prediction function, and have separate runtimes for training and prediction. It might not be possible to represent a model from a previous version of scikit-learn in a newer version.
On Fri, Sep 28, 2018 at 09:45:16PM +0100, Javier López wrote:
This is not the whole truth. Yes, you store the sklearn version on the pickle and raise a warning; I am mostly ok with that, but the pickles are brittle and oftentimes they stop loading when other versions of other stuff change. I am not talking about "Warning: wrong version", but rather "Unpickling error: expected bytes, found tuple" that prevent the file from loading entirely. [...] 1. Things in the current state break when something else changes, not only sklearn. 2. Sharing pickles is a bad practice due to a number of reasons.
The reason that pickles are brittle and that sharing pickles is a bad practice is that pickle use an implicitly defined data model, which is defined via the internals of objects. The "right" solution is to use an explicit data model. This is for instance what is done with an object database. However, this comes at the cost of making it very hard to change objects. First, all objects must be stored with a schema (or language) that is rich enough to represent it, and yet defined somewhat explicitly (to avoid running into the problems of pickle). Second, if the internal representation of the object change, there needs to be explicit conversion code to go from one version to the next. Typically, upgrade of websites that use object database need maintainers to write this conversion code. So, the problems of pickle are not specific to pickle, but rather intrinsic to any generic persistence code [*]. Writing persistence code that does not fall in these problems is very costly in terms of developer time and makes it harder to add new methods or improve existing one. I am not excited about it. Rather, the good practice is that if you want to deploy model you deploy on the exact same environment that you have trained them. The web world is very used to doing that (because they keep falling in these problems), and has developed technology to do this, such as docker containers. I know that it is clunky technology. I don't like it myself, but I don't see a way out of it with our resources. Gaël [*] Back in the days, when I was working on Mayavi, we developed our persistence code, because we were not happy with pickle. It was not pleasant to maintain, and had the same "smell" as pickle. I don't think that it was a great use of our time.
On 10/02/2018 12:01 PM, Gael Varoquaux wrote:
So, the problems of pickle are not specific to pickle, but rather intrinsic to any generic persistence code [*]. Writing persistence code that does not fall in these problems is very costly in terms of developer time and makes it harder to add new methods or improve existing one. I am not excited about it.
I think having solution is to have MS, FB, Amazon, IBM, Nvidia, intel,... maintain our generic persistent code is a decent deal for us */if/* it works out ;) https://onnx.ai/ (MS is providing sklearn to ONNX converters and is extending ONNX to allow for more sklearn estimators to be expressed in ONNX). Containers are a reasonable fallback, though.
On Tue, Oct 02, 2018 at 12:20:40PM -0400, Andreas Mueller wrote:
I think having solution is to have MS, FB, Amazon, IBM, Nvidia, intel,... maintain our generic persistent code is a decent deal for us if it works out ;)
I'll take that deal! :) +1 for onnx, absolutely! G
On Tue, Oct 2, 2018 at 5:07 PM Gael Varoquaux <gael.varoquaux@normalesup.org> wrote:
The reason that pickles are brittle and that sharing pickles is a bad practice is that pickle use an implicitly defined data model, which is defined via the internals of objects.
Plus the fact that loading a pickle can execute arbitrary code, and there is no way to know if any malicious code is in there in advance because the contents of the pickle cannot be easily inspected without loading/executing it.
So, the problems of pickle are not specific to pickle, but rather intrinsic to any generic persistence code [*]. Writing persistence code that does not fall in these problems is very costly in terms of developer time and makes it harder to add new methods or improve existing one. I am not excited about it.
My "text-based serialization" suggestion was nowhere near as ambitious as that, as I have already explained, and wasn't aiming at solving the versioning issues, but rather at having something which is "about as good" as pickle but in a human-readable format. I am not asking for a Turing-complete language to reproduce the prediction function, but rather something simple in the spirit of the output produced by the gist code I linked above, just for the model families where it is reasonable: https://gist.github.com/jlopezpena/2cdd09c56afda5964990d5cf278bfd31 The code I posted mostly works (specific cases of nested models need to be addressed separately, as well as pipelines), and we have been using (a version of) it in production for quite some time. But there are hackish aspects to it that we are not happy with, such as the manual separation of init and fitted parameters by checking if the name ends with "_", having to infer class name and location using "model.__class__.__name__" and "model.__module__", and the wacky use of "__import__". My suggestion was more along the lines of adding some metadata to sklearn estimators so that a code in a similar style would be nicer to write; little things like having a `init_parameters` and `fit_parameters` properties that would return the lists of named parameters, or a `model_info` method that would return data like sklearn version, class name and location, or a package level dictionary pointing at the estimator classes by a string name, like from sklearn.linear_models import LogisticRegression estimator_classes = {"LogisticRegression": LogisticRegression, ...} so that one can load the appropriate class from the string description without calling __import__ or eval; that sort of stuff. I am aware this would not address the common complain of "prefect prediction reproducibility" across versions, but I think we can all agree that this utopia of perfect reproducibility is not feasible. And in the long, long run, I agree that PFA/onnx or whichever similar format that emerges, is the way to go. J
The ONNX-approach sounds most promising, esp. because it will also allow library interoperability but I wonder if this is for parametric models only and not for the nonparametric ones like KNN, tree-based classifiers, etc. All-in-all I can definitely see the appeal for having a way to export sklearn estimators in a text-based format (e.g., via JSON), since it would make sharing code easier. This doesn't even have to be compatible with multiple sklearn versions. A typical use case would be to include these JSON exports as e.g., supplemental files of a research paper for other people to run the models etc. (here, one can just specify which sklearn version it would require; of course, one could also share pickle files, by I am personally always hesitant reg. running/trusting other people's pickle files). Unfortunately though, as Gael pointed out, this "feature" would be a huge burden for the devs, and it would probably also negatively impact the development of scikit-learn itself because it imposes another design constraint. However, I do think this sounds like an excellent case for a contrib project. Like scikit-export, scikit-serialize or sth like that. Best, Sebastian
On Oct 3, 2018, at 5:49 AM, Javier López <jlopez@ende.cc> wrote:
On Tue, Oct 2, 2018 at 5:07 PM Gael Varoquaux <gael.varoquaux@normalesup.org> wrote: The reason that pickles are brittle and that sharing pickles is a bad practice is that pickle use an implicitly defined data model, which is defined via the internals of objects.
Plus the fact that loading a pickle can execute arbitrary code, and there is no way to know if any malicious code is in there in advance because the contents of the pickle cannot be easily inspected without loading/executing it.
So, the problems of pickle are not specific to pickle, but rather intrinsic to any generic persistence code [*]. Writing persistence code that does not fall in these problems is very costly in terms of developer time and makes it harder to add new methods or improve existing one. I am not excited about it.
My "text-based serialization" suggestion was nowhere near as ambitious as that, as I have already explained, and wasn't aiming at solving the versioning issues, but rather at having something which is "about as good" as pickle but in a human-readable format. I am not asking for a Turing-complete language to reproduce the prediction function, but rather something simple in the spirit of the output produced by the gist code I linked above, just for the model families where it is reasonable:
https://gist.github.com/jlopezpena/2cdd09c56afda5964990d5cf278bfd31
The code I posted mostly works (specific cases of nested models need to be addressed separately, as well as pipelines), and we have been using (a version of) it in production for quite some time. But there are hackish aspects to it that we are not happy with, such as the manual separation of init and fitted parameters by checking if the name ends with "_", having to infer class name and location using "model.__class__.__name__" and "model.__module__", and the wacky use of "__import__".
My suggestion was more along the lines of adding some metadata to sklearn estimators so that a code in a similar style would be nicer to write; little things like having a `init_parameters` and `fit_parameters` properties that would return the lists of named parameters, or a `model_info` method that would return data like sklearn version, class name and location, or a package level dictionary pointing at the estimator classes by a string name, like
from sklearn.linear_models import LogisticRegression estimator_classes = {"LogisticRegression": LogisticRegression, ...}
so that one can load the appropriate class from the string description without calling __import__ or eval; that sort of stuff.
I am aware this would not address the common complain of "prefect prediction reproducibility" across versions, but I think we can all agree that this utopia of perfect reproducibility is not feasible.
And in the long, long run, I agree that PFA/onnx or whichever similar format that emerges, is the way to go.
J _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
For ONNX you may be interested in https://github.com/onnx/onnxmltools - which supports conversion of a few skelarn models to ONNX already. However as far as I am aware, none of the ONNX backends actually support the ONNX-ML extended spec (in open-source at least). So you would not be able to actually do prediction I think... As for PFA, to my current knowledge there is no library that does it yet. Our own Aardpfark project (https://github.com/CODAIT/aardpfark) focuses on SparkML export to PFA for now but would like to add sklearn support in the future. On Wed, 3 Oct 2018 at 20:07 Sebastian Raschka <mail@sebastianraschka.com> wrote:
The ONNX-approach sounds most promising, esp. because it will also allow library interoperability but I wonder if this is for parametric models only and not for the nonparametric ones like KNN, tree-based classifiers, etc.
All-in-all I can definitely see the appeal for having a way to export sklearn estimators in a text-based format (e.g., via JSON), since it would make sharing code easier. This doesn't even have to be compatible with multiple sklearn versions. A typical use case would be to include these JSON exports as e.g., supplemental files of a research paper for other people to run the models etc. (here, one can just specify which sklearn version it would require; of course, one could also share pickle files, by I am personally always hesitant reg. running/trusting other people's pickle files).
Unfortunately though, as Gael pointed out, this "feature" would be a huge burden for the devs, and it would probably also negatively impact the development of scikit-learn itself because it imposes another design constraint.
However, I do think this sounds like an excellent case for a contrib project. Like scikit-export, scikit-serialize or sth like that.
Best, Sebastian
On Oct 3, 2018, at 5:49 AM, Javier López <jlopez@ende.cc> wrote:
On Tue, Oct 2, 2018 at 5:07 PM Gael Varoquaux < gael.varoquaux@normalesup.org> wrote: The reason that pickles are brittle and that sharing pickles is a bad practice is that pickle use an implicitly defined data model, which is defined via the internals of objects.
Plus the fact that loading a pickle can execute arbitrary code, and there is no way to know if any malicious code is in there in advance because the contents of the pickle cannot be easily inspected without loading/executing it.
So, the problems of pickle are not specific to pickle, but rather intrinsic to any generic persistence code [*]. Writing persistence code that does not fall in these problems is very costly in terms of developer time and makes it harder to add new methods or improve existing one. I am not excited about it.
My "text-based serialization" suggestion was nowhere near as ambitious as that, as I have already explained, and wasn't aiming at solving the versioning issues, but rather at having something which is "about as good" as pickle but in a human-readable format. I am not asking for a Turing-complete language to reproduce the prediction function, but rather something simple in the spirit of the output produced by the gist code I linked above, just for the model families where it is reasonable:
https://gist.github.com/jlopezpena/2cdd09c56afda5964990d5cf278bfd31
The code I posted mostly works (specific cases of nested models need to be addressed separately, as well as pipelines), and we have been using (a version of) it in production for quite some time. But there are hackish aspects to it that we are not happy with, such as the manual separation of init and fitted parameters by checking if the name ends with "_", having to infer class name and location using "model.__class__.__name__" and "model.__module__", and the wacky use of "__import__".
My suggestion was more along the lines of adding some metadata to sklearn estimators so that a code in a similar style would be nicer to write; little things like having a `init_parameters` and `fit_parameters` properties that would return the lists of named parameters, or a `model_info` method that would return data like sklearn version, class name and location, or a package level dictionary pointing at the estimator classes by a string name, like
from sklearn.linear_models import LogisticRegression estimator_classes = {"LogisticRegression": LogisticRegression, ...}
so that one can load the appropriate class from the string description without calling __import__ or eval; that sort of stuff.
I am aware this would not address the common complain of "prefect prediction reproducibility" across versions, but I think we can all agree that this utopia of perfect reproducibility is not feasible.
And in the long, long run, I agree that PFA/onnx or whichever similar format that emerges, is the way to go.
J _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
On 10/03/2018 03:32 PM, Nick Pentreath wrote:
For ONNX you may be interested in https://github.com/onnx/onnxmltools - which supports conversion of a few skelarn models to ONNX already.
However as far as I am aware, none of the ONNX backends actually support the ONNX-ML extended spec (in open-source at least). So you would not be able to actually do prediction I think... Exactly, that's what I'm waiting for. MS is working on itafaik.
As for PFA, to my current knowledge there is no library that does it yet. Our own Aardpfark project (https://github.com/CODAIT/aardpfark) focuses on SparkML export to PFA for now but would like to add sklearn support in the future.
On Wed, 3 Oct 2018 at 20:07 Sebastian Raschka <mail@sebastianraschka.com <mailto:mail@sebastianraschka.com>> wrote:
The ONNX-approach sounds most promising, esp. because it will also allow library interoperability but I wonder if this is for parametric models only and not for the nonparametric ones like KNN, tree-based classifiers, etc.
All-in-all I can definitely see the appeal for having a way to export sklearn estimators in a text-based format (e.g., via JSON), since it would make sharing code easier. This doesn't even have to be compatible with multiple sklearn versions. A typical use case would be to include these JSON exports as e.g., supplemental files of a research paper for other people to run the models etc. (here, one can just specify which sklearn version it would require; of course, one could also share pickle files, by I am personally always hesitant reg. running/trusting other people's pickle files).
Unfortunately though, as Gael pointed out, this "feature" would be a huge burden for the devs, and it would probably also negatively impact the development of scikit-learn itself because it imposes another design constraint.
However, I do think this sounds like an excellent case for a contrib project. Like scikit-export, scikit-serialize or sth like that.
Best, Sebastian
> On Oct 3, 2018, at 5:49 AM, Javier López <jlopez@ende.cc> wrote: > > > On Tue, Oct 2, 2018 at 5:07 PM Gael Varoquaux <gael.varoquaux@normalesup.org <mailto:gael.varoquaux@normalesup.org>> wrote: > The reason that pickles are brittle and that sharing pickles is a bad > practice is that pickle use an implicitly defined data model, which is > defined via the internals of objects. > > Plus the fact that loading a pickle can execute arbitrary code, and there is no way to know > if any malicious code is in there in advance because the contents of the pickle cannot > be easily inspected without loading/executing it. > > So, the problems of pickle are not specific to pickle, but rather > intrinsic to any generic persistence code [*]. Writing persistence code that > does not fall in these problems is very costly in terms of developer time > and makes it harder to add new methods or improve existing one. I am not > excited about it. > > My "text-based serialization" suggestion was nowhere near as ambitious as that, > as I have already explained, and wasn't aiming at solving the versioning issues, but > rather at having something which is "about as good" as pickle but in a human-readable > format. I am not asking for a Turing-complete language to reproduce the prediction > function, but rather something simple in the spirit of the output produced by the gist code I linked above, just for the model families where it is reasonable: > > https://gist.github.com/jlopezpena/2cdd09c56afda5964990d5cf278bfd31 > > The code I posted mostly works (specific cases of nested models need to be addressed > separately, as well as pipelines), and we have been using (a version of) it in production > for quite some time. But there are hackish aspects to it that we are not happy with, > such as the manual separation of init and fitted parameters by checking if the name ends with "_", having to infer class name and location using > "model.__class__.__name__" and "model.__module__", and the wacky use of "__import__". > > My suggestion was more along the lines of adding some metadata to sklearn estimators so > that a code in a similar style would be nicer to write; little things like having a `init_parameters` and `fit_parameters` properties that would return the lists of named parameters, > or a `model_info` method that would return data like sklearn version, class name and location, or a package level dictionary pointing at the estimator classes by a string name, like > > from sklearn.linear_models import LogisticRegression > estimator_classes = {"LogisticRegression": LogisticRegression, ...} > > so that one can load the appropriate class from the string description without calling __import__ or eval; that sort of stuff. > > I am aware this would not address the common complain of "prefect prediction reproducibility" > across versions, but I think we can all agree that this utopia of perfect reproducibility is not > feasible. > > And in the long, long run, I agree that PFA/onnx or whichever similar format that emerges, is > the way to go. > > J > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org <mailto:scikit-learn@python.org> > https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Le 26/09/2018 à 21:59, Joel Nothman a écrit :
And for those interested in what's in the pipeline, we are trying to draft a roadmap... https://github.com/scikit-learn/scikit-learn/wiki/Draft-Roadmap-2018 Hello,
First of all thanks for the incredible work on scikit-learn. I found the RoadMap quite cool and in line with some of my own concerns. In particular : * "Make it easier for external users to write Scikit-learn-compatible components" - really a great goal to have a stable ecosystem * "Passing around information that is not (X, y)" - faced it. * "Better interface for interactive development" (wow - very feature - such cool - how many great !) * Improved tracking of fitting (cool for early stopping while doing hyper parameter search, or simply testing some model in a notebook) However, here are some aspect that I, modestly, would like to see (also maybe for some of them there is work in progress or external lib, let me know): * chunk processing (kind of handling streaming data) : when dealing with lot of data, the ability to fit_partial, then use transform on chunks of data is of good help. But it's not well exposed in current doc and API, and a lot of models do not support it, while they could. Also pipeline does not support fit_partial and there is not fit_transform_partial. * while handling "Passing around information that is not (X, y)", is there any plan to have transform being able to transform X and y ? This would ease lots of problems like subsampling, resampling or masking data when too incomplete. In my case for example, while transforming words to vectors, I may end with sentences full of out of vocabulary words, hence some sample I would like to let aside, but can't because I do not have hands on y. (and introducing it, make me loose my ability to use my precious pipeline). I think Python offers possibilities to handle the API change (for example we can have a new transform_xy method, and a compatibility transform using it until deprecation) Also I understand that changing the API is always a big deal. But I think scikit-learn, because of its API has played a good role in standardizing the python ML ecosystem and this is a key contribution. Not dealing with mature new needs and some of actual API initial flaws, may deserve whole community as new independent and inconsistent API will flourish as no project has the legitimity of scikit-learn. So courage :-) Also having good integrations to popular framework like keras or gensim, would be great (but the goal of third party packages of course). Of course writing all this, I don't want to sonud pedantic. I know I'm not so experimented with scikit-learn (nor did contribute to it), so take for what it is. Have a good day ! Alex -- Alexandre Garel tel : +33 7 68 52 69 07 / +213 656 11 85 10 skype: alexgarel / ring: ba0435e11af36e32e9b4eb13c19c52fd75c7b4b0
Thank you for your feedback Alex! On 10/02/2018 09:28 AM, Alex Garel wrote:
* chunk processing (kind of handling streaming data) : when dealing with lot of data, the ability to fit_partial, then use transform on chunks of data is of good help. But it's not well exposed in current doc and API,
This has been discussed in the past, but it looks like no-one was excited enough about it to add it to the roadmap. This would require quite some additions to the API. Olivier, who has been quite interested in this before now seems to be more interested in integration with dask, which might achieve the same thing.
* and a lot of models do not support it, while they could.
Can you give examples of that?
* Also pipeline does not support fit_partial and there is not fit_transform_partial.
What would you expect those to do? Each step in the pipeline might require passing over the whole dataset multiple times before being able to transform anything. That basically makes the current interface impossible to work with the pipeline. Even if only a single pass of the dataset was required, that wouldn't work with the current interface. If we would be handing around generators that allow to loop over the whole data, that would work. But it would be unclear how to support a streaming setting.
* while handling "Passing around information that is not (X, y)", is there any plan to have transform being able to transform X and y ? This would ease lots of problems like subsampling, resampling or masking data when too incomplete.
An API for subsampling is on the roadmap :)
Le 02/10/2018 à 16:46, Andreas Mueller a écrit :
Thank you for your feedback Alex! Thanks for answering !
On 10/02/2018 09:28 AM, Alex Garel wrote:
* chunk processing (kind of handling streaming data) : when dealing with lot of data, the ability to fit_partial, then use transform on chunks of data is of good help. But it's not well exposed in current doc and API,
This has been discussed in the past, but it looks like no-one was excited enough about it to add it to the roadmap. This would require quite some additions to the API. Olivier, who has been quite interested in this before now seems to be more interested in integration with dask, which might achieve the same thing.
I've tried to use Dask on my side, but for now, though going quite ahead, I didn't suceed completly because (in my specific case) of memory issues (dask default schedulers do not specialize processes on tasks, and I had some memory consuming tasks but I didn't get far enough to write my own scheduler). However I might deal with that later (not writing a scheduler but sharing memory with mmap, in this case). But yes Dask is about the "chunk instead of really streaming" approach (which was my point).
* and a lot of models do not support it, while they could.
Can you give examples of that? Hum I spoke maybe too fast ! Greping the code give me some example at least, and it's true that a DecisionTree does not hold it naturally !
* Also pipeline does not support fit_partial and there is not fit_transform_partial.
What would you expect those to do? Each step in the pipeline might require passing over the whole dataset multiple times before being able to transform anything. That basically makes the current interface impossible to work with the pipeline. Even if only a single pass of the dataset was required, that wouldn't work with the current interface. If we would be handing around generators that allow to loop over the whole data, that would work. But it would be unclear how to support a streaming setting. You're right, I didn't think hard enough about it !
BTW I made some test using generators and making fit / transform build pipelines that I consumed latter on (tried with plain iterators and streamz). It did work somehow, with much hacks, but in my specific case, performance where not good enough. (real problem was not framework performance, but my architecture where I realize, that constantly re-generating data instead of doing it once was not fast enough). So finally my points were not so good, but at least I did learn something ;-) Thanks for your time. -- Alexandre Garel tel : +33 7 68 52 69 07 / +213 656 11 85 10 skype: alexgarel / ring: ba0435e11af36e32e9b4eb13c19c52fd75c7b4b0
Hurray, thanks to everybody; in particular for those who did the hard work of ironing out the last issues and releasing. Gaël On Wed, Sep 26, 2018 at 02:55:57PM -0400, Andreas Mueller wrote:
Hey everbody! I'm happy to (finally) announce scikit-learn 0.20.0. This release is dedicated to the memory of Raghav Rajagopalan.
You can upgrade now with pip or conda!
There is many important additions and updates, and you can find the full release notes here: http://scikit-learn.org/stable/whats_new.html#version-0-20
My personal highlights are the ColumnTransformer and the changes to OneHotEncoder, but there's so much more!
An important note is that this is the last version to support Python2.7, and the next release will require Python 3.5.
A big thank you to everybody who contributed and special thanks to Joel!
All the best, Andy _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Gael Varoquaux Senior Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux
This is wonderful news! Congrats everyone. I can‘t wait to check out the game changing column transformer! Denis On Wed 26 Sep 2018 at 23:45, Gael Varoquaux <gael.varoquaux@normalesup.org> wrote:
Hurray, thanks to everybody; in particular for those who did the hard work of ironing out the last issues and releasing.
Gaël
On Wed, Sep 26, 2018 at 02:55:57PM -0400, Andreas Mueller wrote:
Hey everbody! I'm happy to (finally) announce scikit-learn 0.20.0. This release is dedicated to the memory of Raghav Rajagopalan.
You can upgrade now with pip or conda!
There is many important additions and updates, and you can find the full release notes here: http://scikit-learn.org/stable/whats_new.html#version-0-20
My personal highlights are the ColumnTransformer and the changes to OneHotEncoder, but there's so much more!
An important note is that this is the last version to support Python2.7, and the next release will require Python 3.5.
A big thank you to everybody who contributed and special thanks to Joel!
All the best, Andy _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Gael Varoquaux Senior Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Congrat all team! Aiden Nguyen -- Nguyen Thien Bao, PhD Director and Founder, HBB Tech, Vietnam Co-founder, HBB Solutions, Vietnam Head, R&D Division, Cardano Labo, Vietnam NeuroInformatics Laboratory (NILab), Fondazione Bruno Kessler (FBK), Trento, Italy Centro Interdipartimentale Mente e Cervello (CIMeC), Universita degli Studi di Trento, Italy Surgical Planning Laboratory (SPL), Department of Radiology, BWH, Harvard University, MA, USA Lecturer, Faculty of Information Technology, University of Technology and Education, Ho Chi Minh, Vietnam Email: bao at bwh.harvard.edu or tbnguyen at fbk.eu or baont at hbbsolution.com or ntbaovn at gmail.com Fax: +39.0461.283.091 Cellphone: +1. 857.265.6408 (USA) +39.345.293.1006 (Italy) +84.9.2761.3761 (VietNam) On Thu, Sep 27, 2018 at 12:49 PM Denis-Alexander Engemann < denis.engemann@gmail.com> wrote:
This is wonderful news! Congrats everyone. I can‘t wait to check out the game changing column transformer!
Denis On Wed 26 Sep 2018 at 23:45, Gael Varoquaux <gael.varoquaux@normalesup.org> wrote:
Hurray, thanks to everybody; in particular for those who did the hard work of ironing out the last issues and releasing.
Gaël
On Wed, Sep 26, 2018 at 02:55:57PM -0400, Andreas Mueller wrote:
Hey everbody! I'm happy to (finally) announce scikit-learn 0.20.0. This release is dedicated to the memory of Raghav Rajagopalan.
You can upgrade now with pip or conda!
There is many important additions and updates, and you can find the full release notes here: http://scikit-learn.org/stable/whats_new.html#version-0-20
My personal highlights are the ColumnTransformer and the changes to OneHotEncoder, but there's so much more!
An important note is that this is the last version to support Python2.7, and the next release will require Python 3.5.
A big thank you to everybody who contributed and special thanks to Joel!
All the best, Andy _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Gael Varoquaux Senior Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Huge huge Thank you developers! Keep up the good work! El mié., 26 sept. 2018 20:57, Andreas Mueller <t3kcit@gmail.com> escribió:
Hey everbody! I'm happy to (finally) announce scikit-learn 0.20.0. This release is dedicated to the memory of Raghav Rajagopalan.
You can upgrade now with pip or conda!
There is many important additions and updates, and you can find the full release notes here: http://scikit-learn.org/stable/whats_new.html#version-0-20
My personal highlights are the ColumnTransformer and the changes to OneHotEncoder, but there's so much more!
An important note is that this is the last version to support Python2.7, and the next release will require Python 3.5.
A big thank you to everybody who contributed and special thanks to Joel!
All the best, Andy _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
participants (13)
-
Aiden Nguyen -
Alex Garel -
Andreas Mueller -
bthirion -
Denis-Alexander Engemann -
Gael Varoquaux -
Javier López -
Joel Nothman -
Manuel CASTEJÓN LIMAS -
Nick Pentreath -
Olivier Grisel -
Raga Markely -
Sebastian Raschka