<div dir="ltr"><p>Hello,</p>
<p>I am one of the developers of a library for Dynamic Ensemble
Selection (DES) methods (the library is called DESlib), and we are
currently working to get the library fully compatible with scikit-learn
(to submit it to scikit-learn-contrib). We have "check_estimator"
working for most of the classes, but now I am having problems to make
the classes compatible with GridSearch / other CV functions.</p>
<p>One of the main use cases of this library is to facilitate research
on
this field, and this led to a design decision that the base classifiers
are fit by the user, and the DES methods receive a pool of base
classifiers
that were already fit (this allow users to compare many DES techniques
with the same base classifiers). This is creating an issue with
GridSearch, since the clone method (defined in sklearn.base) is not
cloning the classes as we would like. It does a shallow (non-deep) copy
of the parameters, but we would like the pool of base classifiers to be
deep-copied.</p>
<p>I analyzed this issue and I could not find a solution that does not
require changes on the scikit-learn code. Here is the sequence of steps
that cause the problem:<br>
</p>
<ol>
<li>GridSearchCV calls "clone" on the DES estimator (<a href="https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/model_selection/_search.py#L677">link</a>)<br>
</li>
<li>The clone function calls the "get_params" function of the DES estimator (<a href="https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py#L60-L63">link</a>,
line 60). We don't re-implement this function, so it gets all the
parameters, including the pool of classifiers (at this point, they are
still "fitted")</li>
<li>The clone function then clones each parameter with safe=False
(line 62). When cloning the pool of classifiers, the result is a pool
that is not "fitted" anymore.<br>
</li>
</ol>
<p>The problem is that, to my knowledge, there is no way for my
classifier to inform "clone" that a parameter should be always deep
copied. I see that other ensemble methods in sklearn always fit the base
classifiers within the "fit" method of the ensemble, so this problem
does not happen there. I would like to know if there is a solution for
this problem while having the base classifiers fitted elsewhere.</p>
<p>Here is a short code that reproduces the issue:</p>
<p>---------------------------</p>
<p>from sklearn.model_selection import GridSearchCV, train_test_split<br>
from sklearn.base import BaseEstimator, ClassifierMixin<br>
from sklearn.ensemble import BaggingClassifier<br>
from sklearn.datasets import load_iris<br>
<br>
<br>
class MyClassifier(BaseEstimator, ClassifierMixin):<br>
def __init__(self, base_classifiers, k):<br>
self.base_classifiers = base_classifiers # Base classifiers that are already trained<br>
self.k = k # Simulate a parameter that we want to do a grid search on<br>
<br>
def fit(self, X_dsel, y_dsel):<br>
pass # Here we would fit any parameters for the Dynamic selection method, not the base classifiers<br>
<br>
def predict(self, X):<br>
return self.base_classifiers.predict(X) # In practice the
methods would do something with the predictions of each classifier<br>
<br>
<br>
X, y = load_iris(return_X_y=True)<br>
X_train, X_dsel, y_train, y_dsel = train_test_split(X, y, test_size=0.5)<br>
<br>
base_classifiers = BaggingClassifier()<br>
base_classifiers.fit(X_train, y_train)<br>
<br>
clf = MyClassifier(base_classifiers, k=1)<br>
<br>
params = {'k': [1, 3, 5, 7]}<br>
grid = GridSearchCV(clf, params)<br>
<br>
grid.fit(X_dsel, y_dsel) # Raises error that the bagging classifiers are not fitted<br>
</p>
<p>---------------------------<br>
</p>
<p>Btw, here is the branch that we are using to make the library
compatible with sklearn:
<a href="https://github.com/Menelau/DESlib/tree/sklearn-estimators">https://github.com/Menelau/DESlib/tree/sklearn-estimators</a>. The failing
test related to this issue is in
<a href="https://github.com/Menelau/DESlib/blob/sklearn-estimators/deslib/tests/test_des_integration.py#L36">https://github.com/Menelau/DESlib/blob/sklearn-estimators/deslib/tests/test_des_integration.py#L36</a></p>
<p>Thanks in advance for any help on this case,</p>
<p>Luiz Gustavo Hafemann<br>
</p></div>