Re: [scikit-learn] Issues with clone for ensemble of, classifiers
Guillaume - thank you for the comments. Indeed, an approach to "freeze" a fitted classifier would solve our problem. The Github issue seems to be inactive for a while, but I will check if anyone else is working on it. Luiz Gustavo On Wed, Sep 19, 2018 at 12:02 PM <scikit-learn-request@python.org> wrote:
Send scikit-learn mailing list submissions to scikit-learn@python.org
To subscribe or unsubscribe via the World Wide Web, visit https://mail.python.org/mailman/listinfo/scikit-learn or, via email, send a message with subject or body 'help' to scikit-learn-request@python.org
You can reach the person managing the list at scikit-learn-owner@python.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of scikit-learn digest..."
Today's Topics:
1. Re: Issues with clone for ensemble of classifiers (Guillaume Lema?tre)
----------------------------------------------------------------------
Message: 1 Date: Wed, 19 Sep 2018 17:38:46 +0200 From: Guillaume Lema?tre <g.lemaitre58@gmail.com> To: Scikit-learn user and developer mailing list <scikit-learn@python.org> Subject: Re: [scikit-learn] Issues with clone for ensemble of classifiers Message-ID: <CACDxx9gyszjJP-5ZB_bvH4nCkdn-sb6CCb= k2j_kOOnFPBQt0g@mail.gmail.com> Content-Type: text/plain; charset="UTF-8"
However, there is some issue to frozen a fitted classifier. You can refer to:
https://github.com/scikit-learn/scikit-learn/issues/8370
with the associated discussion. On Wed, 19 Sep 2018 at 17:34, Guillaume Lema?tre <g.lemaitre58@gmail.com> wrote:
Ups I misread your comment. I don't think that we have currently a mechanism to avoid cloning classifier internally. On Wed, 19 Sep 2018 at 17:31, Guillaume Lema?tre <g.lemaitre58@gmail.com>
You don't have anywhere in your class MyClassifier where you are calling base_classifier.fit(...) therefore when calling base_classifier.predict(...) it will let you know that you did not fit it.
On Wed, 19 Sep 2018 at 16:43, Luiz Gustavo Hafemann <luiz.gh@gmail.com>
wrote:
Hello,
I am one of the developers of a library for Dynamic Ensemble
Selection (DES) methods (the library is called DESlib), and we are currently working to get the library fully compatible with scikit-learn (to submit it to scikit-learn-contrib). We have "check_estimator" working for most of the classes, but now I am having problems to make the classes compatible with GridSearch / other CV functions.
One of the main use cases of this library is to facilitate research
on this field, and this led to a design decision that the base classifiers are fit by the user, and the DES methods receive a pool of base classifiers
I analyzed this issue and I could not find a solution that does not
require changes on the scikit-learn code. Here is the sequence of steps
GridSearchCV calls "clone" on the DES estimator (link) The clone function calls the "get_params" function of the DES
estimator (link, line 60). We don't re-implement this function, so it gets all the parameters, including the pool of classifiers (at this point, they are still "fitted")
The clone function then clones each parameter with safe=False (line 62). When cloning the pool of classifiers, the result is a pool that is not "fitted" anymore.
The problem is that, to my knowledge, there is no way for my classifier to inform "clone" that a parameter should be always deep copied. I see that other ensemble methods in sklearn always fit the base classifiers within the "fit" method of the ensemble, so this problem does not happen there. I would like to know if there is a solution for this
Here is a short code that reproduces the issue:
---------------------------
from sklearn.model_selection import GridSearchCV, train_test_split from sklearn.base import BaseEstimator, ClassifierMixin from sklearn.ensemble import BaggingClassifier from sklearn.datasets import load_iris
class MyClassifier(BaseEstimator, ClassifierMixin): def __init__(self, base_classifiers, k): self.base_classifiers = base_classifiers # Base classifiers
wrote: that were already fit (this allow users to compare many DES techniques with the same base classifiers). This is creating an issue with GridSearch, since the clone method (defined in sklearn.base) is not cloning the classes as we would like. It does a shallow (non-deep) copy of the parameters, but we would like the pool of base classifiers to be deep-copied. that cause the problem: problem while having the base classifiers fitted elsewhere. that are already trained
self.k = k # Simulate a parameter that we want to do a grid
search on
def fit(self, X_dsel, y_dsel): pass # Here we would fit any parameters for the Dynamic
selection method, not the base classifiers
def predict(self, X): return self.base_classifiers.predict(X) # In practice the
methods would do something with the predictions of each classifier
X, y = load_iris(return_X_y=True) X_train, X_dsel, y_train, y_dsel = train_test_split(X, y,
test_size=0.5)
base_classifiers = BaggingClassifier() base_classifiers.fit(X_train, y_train)
clf = MyClassifier(base_classifiers, k=1)
params = {'k': [1, 3, 5, 7]} grid = GridSearchCV(clf, params)
grid.fit(X_dsel, y_dsel) # Raises error that the bagging
classifiers are not fitted
---------------------------
Btw, here is the branch that we are using to make the library
compatible with sklearn: https://github.com/Menelau/DESlib/tree/sklearn-estimators. The failing test related to this issue is in https://github.com/Menelau/DESlib/blob/sklearn-estimators/deslib/tests/test_...
Thanks in advance for any help on this case,
Luiz Gustavo Hafemann
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/
-- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/
-- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/
------------------------------
Subject: Digest Footer
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
------------------------------
End of scikit-learn Digest, Vol 30, Issue 14 ********************************************
Yes, I actually mentioned that on the roadmap thread. It should definitely be added. On 09/19/2018 06:17 PM, Guillaume Lemaître wrote:
Actually I don't see anything mentioning it in the road map currently. Should it be added?
Sent from my phone - sorry to be brief and potential misspell.
*From:* luiz.gh@gmail.com *Sent:* 19 September 2018 7:12 pm *To:* scikit-learn@python.org *Reply to:* scikit-learn@python.org *Subject:* Re: [scikit-learn] Issues with clone for ensemble of, classifiers
Guillaume - thank you for the comments. Indeed, an approach to "freeze" a fitted classifier would solve our problem. The Github issue seems to be inactive for a while, but I will check if anyone else is working on it.
Luiz Gustavo
On Wed, Sep 19, 2018 at 12:02 PM <scikit-learn-request@python.org <mailto:scikit-learn-request@python.org>> wrote:
Send scikit-learn mailing list submissions to scikit-learn@python.org <mailto:scikit-learn@python.org>
To subscribe or unsubscribe via the World Wide Web, visit https://mail.python.org/mailman/listinfo/scikit-learn or, via email, send a message with subject or body 'help' to scikit-learn-request@python.org <mailto:scikit-learn-request@python.org>
You can reach the person managing the list at scikit-learn-owner@python.org <mailto:scikit-learn-owner@python.org>
When replying, please edit your Subject line so it is more specific than "Re: Contents of scikit-learn digest..."
Today's Topics:
1. Re: Issues with clone for ensemble of classifiers (Guillaume Lema?tre)
----------------------------------------------------------------------
Message: 1 Date: Wed, 19 Sep 2018 17:38:46 +0200 From: Guillaume Lema?tre <g.lemaitre58@gmail.com <mailto:g.lemaitre58@gmail.com>> To: Scikit-learn user and developer mailing list <scikit-learn@python.org <mailto:scikit-learn@python.org>> Subject: Re: [scikit-learn] Issues with clone for ensemble of classifiers Message-ID: <CACDxx9gyszjJP-5ZB_bvH4nCkdn-sb6CCb=k2j_kOOnFPBQt0g@mail.gmail.com <mailto:k2j_kOOnFPBQt0g@mail.gmail.com>> Content-Type: text/plain; charset="UTF-8"
However, there is some issue to frozen a fitted classifier. You can refer to:
https://github.com/scikit-learn/scikit-learn/issues/8370
with the associated discussion. On Wed, 19 Sep 2018 at 17:34, Guillaume Lema?tre <g.lemaitre58@gmail.com <mailto:g.lemaitre58@gmail.com>> wrote: > > Ups I misread your comment. I don't think that we have currently a > mechanism to avoid cloning classifier internally. > On Wed, 19 Sep 2018 at 17:31, Guillaume Lema?tre <g.lemaitre58@gmail.com <mailto:g.lemaitre58@gmail.com>> wrote: > > > > You don't have anywhere in your class MyClassifier where you are > > calling base_classifier.fit <http://classifier.fit>(...) therefore when calling > > base_classifier.predict <http://classifier.predict>(...) it will let you know that you did not fit > > it. > > > > On Wed, 19 Sep 2018 at 16:43, Luiz Gustavo Hafemann <luiz.gh@gmail.com <mailto:luiz.gh@gmail.com>> wrote: > > > > > > Hello, > > > > > > I am one of the developers of a library for Dynamic Ensemble Selection (DES) methods (the library is called DESlib), and we are currently working to get the library fully compatible with scikit-learn (to submit it to scikit-learn-contrib). We have "check_estimator" working for most of the classes, but now I am having problems to make the classes compatible with GridSearch / other CV functions. > > > > > > One of the main use cases of this library is to facilitate research on this field, and this led to a design decision that the base classifiers are fit by the user, and the DES methods receive a pool of base classifiers that were already fit (this allow users to compare many DES techniques with the same base classifiers). This is creating an issue with GridSearch, since the clone method (defined in sklearn.base <http://sklearn.base>) is not cloning the classes as we would like. It does a shallow (non-deep) copy of the parameters, but we would like the pool of base classifiers to be deep-copied. > > > > > > I analyzed this issue and I could not find a solution that does not require changes on the scikit-learn code. Here is the sequence of steps that cause the problem: > > > > > > GridSearchCV calls "clone" on the DES estimator (link) > > > The clone function calls the "get_params" function of the DES estimator (link, line 60). We don't re-implement this function, so it gets all the parameters, including the pool of classifiers (at this point, they are still "fitted") > > > The clone function then clones each parameter with safe=False (line 62). When cloning the pool of classifiers, the result is a pool that is not "fitted" anymore. > > > > > > The problem is that, to my knowledge, there is no way for my classifier to inform "clone" that a parameter should be always deep copied. I see that other ensemble methods in sklearn always fit the base classifiers within the "fit" method of the ensemble, so this problem does not happen there. I would like to know if there is a solution for this problem while having the base classifiers fitted elsewhere. > > > > > > Here is a short code that reproduces the issue: > > > > > > --------------------------- > > > > > > from sklearn.model_selection import GridSearchCV, train_test_split > > > from sklearn.base <http://sklearn.base> import BaseEstimator, ClassifierMixin > > > from sklearn.ensemble <http://sklearn.ensemble> import BaggingClassifier > > > from sklearn.datasets <http://sklearn.datasets> import load_iris > > > > > > > > > class MyClassifier(BaseEstimator, ClassifierMixin): > > > def __init__(self, base_classifiers, k): > > > self.base_classifiers = base_classifiers # Base classifiers that are already trained > > > self.k = k # Simulate a parameter that we want to do a grid search on > > > > > > def fit(self, X_dsel, y_dsel): > > > pass # Here we would fit any parameters for the Dynamic selection method, not the base classifiers > > > > > > def predict(self, X): > > > return self.base_classifiers.predict <http://classifiers.predict>(X) # In practice the methods would do something with the predictions of each classifier > > > > > > > > > X, y = load_iris(return_X_y=True) > > > X_train, X_dsel, y_train, y_dsel = train_test_split(X, y, test_size=0.5) > > > > > > base_classifiers = BaggingClassifier() > > > base_classifiers.fit <http://classifiers.fit>(X_train, y_train) > > > > > > clf = MyClassifier(base_classifiers, k=1) > > > > > > params = {'k': [1, 3, 5, 7]} > > > grid = GridSearchCV(clf, params) > > > > > > grid.fit <http://grid.fit>(X_dsel, y_dsel) # Raises error that the bagging classifiers are not fitted > > > > > > --------------------------- > > > > > > Btw, here is the branch that we are using to make the library compatible with sklearn: https://github.com/Menelau/DESlib/tree/sklearn-estimators. The failing test related to this issue is in https://github.com/Menelau/DESlib/blob/sklearn-estimators/deslib/tests/test_... > > > > > > Thanks in advance for any help on this case, > > > > > > Luiz Gustavo Hafemann > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn@python.org <mailto:scikit-learn@python.org> > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > -- > > Guillaume Lemaitre > > INRIA Saclay - Parietal team > > Center for Data Science Paris-Saclay > > https://glemaitre.github.io/ > > > > -- > Guillaume Lemaitre > INRIA Saclay - Parietal team > Center for Data Science Paris-Saclay > https://glemaitre.github.io/
-- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/
------------------------------
Subject: Digest Footer
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
------------------------------
End of scikit-learn Digest, Vol 30, Issue 14 ********************************************
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
participants (3)
-
Andreas Mueller -
Guillaume Lemaître -
Luiz Gustavo Hafemann