Three new scikit-learn-contrib projects
Hi everyone, We are pleased to announce that three new projects recently joined scikit-learn-contrib! * imbalanced-learn: https://github.com/scikit-learn-contrib/imbalanced-learn Python module to perform under sampling and over sampling with various techniques. * polylearn: https://github.com/scikit-learn-contrib/polylearn Factorization machines and polynomial networks for classification and regression in Python. * forest-confidence-interval: https://github.com/scikit-learn-contrib/forest-confidence-interval Confidence intervals for scikit-learn forest algorithms. We thank the respective authors for their neat contribution to the scikit-learn ecosystem! Cheers, Mathieu
Congrats! These look great, thanks to both the authors and the scikit-learn-contrib organizers for putting this together. Nelson On Tue, Jul 19, 2016 at 9:09 AM Mathieu Blondel <mathieu@mblondel.org> wrote:
Hi everyone,
We are pleased to announce that three new projects recently joined scikit-learn-contrib!
* imbalanced-learn: https://github.com/scikit-learn-contrib/imbalanced-learn
Python module to perform under sampling and over sampling with various techniques.
* polylearn: https://github.com/scikit-learn-contrib/polylearn
Factorization machines and polynomial networks for classification and regression in Python.
* forest-confidence-interval: https://github.com/scikit-learn-contrib/forest-confidence-interval
Confidence intervals for scikit-learn forest algorithms.
We thank the respective authors for their neat contribution to the scikit-learn ecosystem!
Cheers, Mathieu _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Awesome! Thanks to the contributors On Tue, Jul 19, 2016 at 9:44 PM, Nelson Liu <nfliu@uw.edu> wrote:
Congrats! These look great, thanks to both the authors and the scikit-learn-contrib organizers for putting this together.
Nelson
On Tue, Jul 19, 2016 at 9:09 AM Mathieu Blondel <mathieu@mblondel.org> wrote:
Hi everyone,
We are pleased to announce that three new projects recently joined scikit-learn-contrib!
* imbalanced-learn: https://github.com/scikit-learn-contrib/imbalanced-learn
Python module to perform under sampling and over sampling with various techniques.
* polylearn: https://github.com/scikit-learn-contrib/polylearn
Factorization machines and polynomial networks for classification and regression in Python.
* forest-confidence-interval: https://github.com/scikit-learn-contrib/forest-confidence-interval
Confidence intervals for scikit-learn forest algorithms.
We thank the respective authors for their neat contribution to the scikit-learn ecosystem!
Cheers, Mathieu _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hey, These packages look great! I was interested in the imbalanced learning, which is something that we stumbled upon:
* imbalanced-learn: https://github.com/scikit-learn-contrib/imbalanced-learn
Python module to perform under sampling and over sampling with various techniques.
Interestingly, the fit_sample method is related to the scikit-learn enhancement proposal that we have tried to put together objects that can modify the y in addition to the X: https://github.com/scikit-learn/enhancement_proposals/pull/2 I think that this enhancement proposal of our API is important for two reasons. The first one is that the corresponding objects cannot be put in a pipeline (imbalanced-learn ends up having it's own pipeline), and hence cannot benefit from hyper-parameter tuning on the full set of steps, or cool things like DaskLearn. The second one is that different projects are likely to come up with similar but incompatible solutions to this problem, making it harder to combine things. Unfortunately, I haven't had time to push forward this proposal. But comments on it (or a pull request to it) would be awesome. Cheers, Gaël
Hi Gael, I was wondering if you could elaborate on the problem of hyper-parameter tuning and why the imbalanced-learn would not benefit from it. Since that we used the identical pipeline of scikit-learn and add the part to handle the sampler, I would have think that we could use it. However this is true that I did not play to much with this part of the API, so I should probably missed something. Cheers, On 20 July 2016 at 09:48, Gael Varoquaux <gael.varoquaux@normalesup.org> wrote:
Hey,
These packages look great! I was interested in the imbalanced learning, which is something that we stumbled upon:
* imbalanced-learn: https://github.com/scikit-learn-contrib/imbalanced-learn
Python module to perform under sampling and over sampling with various techniques.
Interestingly, the fit_sample method is related to the scikit-learn enhancement proposal that we have tried to put together objects that can modify the y in addition to the X: https://github.com/scikit-learn/enhancement_proposals/pull/2
I think that this enhancement proposal of our API is important for two reasons. The first one is that the corresponding objects cannot be put in a pipeline (imbalanced-learn ends up having it's own pipeline), and hence cannot benefit from hyper-parameter tuning on the full set of steps, or cool things like DaskLearn. The second one is that different projects are likely to come up with similar but incompatible solutions to this problem, making it harder to combine things.
Unfortunately, I haven't had time to push forward this proposal. But comments on it (or a pull request to it) would be awesome.
Cheers,
Gaël _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- *LEMAÎTRE GuillaumePhD CandidateMSc Erasmus Mundus ViBOT (Vision-roBOTic)MSc Business Innovation and Technology Management* g.lemaitre58@gmail.com *ViCOROB - Computer Vision and Robotic Team* Universitat de Girona, Campus Montilivi, Edifici P-IV 17071 Girona Tel. +34 972 41 98 12 - Fax. +34 972 41 82 59 http://vicorob.udg.es/ *LE2I - Le Creusot*IUT Le Creusot, Laboratoire LE2I, 12 rue de la Fonderie, 71200 Le Creusot Tel. +33 3 85 73 10 90 - Fax. +33 3 85 73 10 97 http://le2i.cnrs.fr https://sites.google.com/site/glemaitre58/ Vice - Chairman of A.S.C. Fours UFOLEP Chairman of A.S.C. Fours FFC Webmaster of http://ascfours.free.fr
On 07/20/2016 01:31 PM, Guillaume Lemaître wrote:
Hi Gael,
I was wondering if you could elaborate on the problem of hyper-parameter tuning and why the imbalanced-learn would not benefit from it. Since that we used the identical pipeline of scikit-learn and add the part to handle the sampler, I would have think that we could use it.
However this is true that I did not play to much with this part of the API, so I should probably missed something.
The assumption is that hyper-parameter tuning uses Pipelines, I think. You want to select all steps in your processing, which is rarely just a single model. However, Pipeline can currently not change the number of samples (see the enhancement proposal Gael linked to). So you can not use your methods in the standard scikit-learn pipeline. Best, Andy
participants (6)
-
Andreas Mueller -
Gael Varoquaux -
Guillaume Lemaître -
Mathieu Blondel -
Nelson Liu -
Startup Hire