[scikit-learn] Tuning custom parameters using grid_search

Wed Sep 7 14:26:55 EDT 2016

Hi, Piotr,

> These preprocessing steps have some parameters too, which I would like to tune.
> I know that it is possible to tune the parameters of the preprocessing steps, 
> if they are part pf my pipeline. 
> E.g. if I am using PCA, I could tune the parameter n_components, right?
> 
> But what if I have some "custom" preprocessing code with some parameters?
> Is it possible to create a scikit-compatible "object" of my custom code in order to tune the
> parameters in the pipeline with grid search?

Yeah, you could use the Pipeline class or the `make_pipeline` function, then you can create a custom estimator using the BaseEstimator class like so:

class CustomEstimator(BaseEstimator):

    def __init__(self, my_param=None):
        pass

    def fit_transform(self, X, y=None):
        return self.fit(X).transform(X)

    def transform(self, X, y=None):
        return X

    def fit(self, X, y=None):
        return self

pipe = make_pipeline(CustomEstimator(), 
                     LogisticRegression())
grid = {'customestimator__my_param': [3],
        'logisticregression__C': [0.1, 1.0, 10.0]}

gsearch1 = GridSearchCV(estimator=pipe, param_grid=grid)

gsearch1.fit(X, y)

Then, you can put in your desired preprocessing stuff into fit and transform.

Best,
Sebastian

> On Sep 7, 2016, at 2:03 PM, Piotr Bialecki <piotr.bialecki at hotmail.de> wrote:
> 
> Hi all,
> 
> I am currently tuning some parameters of my xgboost model using scikit's grid_search, e.g.:
> 
> param_test1 = {'max_depth':range(3,10,2),
>                            'min_child_weight':range(1,6,2)
> }
> gsearch1 = GridSearchCV(estimator = XGBClassifier(learning_rate =0.1, n_estimators=762,
>                                                                                      max_depth=5, min_child_weight=1, gamma=0, 
>                                                                                      subsample=0.8, colsample_bytree=0.8,
>                                                                                      objective= 'binary:logistic', nthread=4, 
>                                                                                      scale_pos_weight=1, seed=2809), 
>                                             param_grid = param_test1, 
>                                             scoring='roc_auc',
>                                             n_jobs=6,
>                                             iid=False, cv=5)
> 
> Before that I preprocessed my dataset X with some different methods.
> These preprocessing steps have some parameters too, which I would like to tune.
> I know that it is possible to tune the parameters of the preprocessing steps, 
> if they are part pf my pipeline. 
> E.g. if I am using PCA, I could tune the parameter n_components, right?
> 
> But what if I have some "custom" preprocessing code with some parameters?
> Is it possible to create a scikit-compatible "object" of my custom code in order to tune the
> parameters in the pipeline with grid search?
> Imagine I would like to write a custom method FeatureMultiplier() with a parameter multiplier_value.
> Is it possible to create a scikit-compatible class out of this method and tune it with grid search?
> 
> I thought I saw a talk about exactly this topic at some PyData in 2016 or 2015,
> but unfortunately I cannot find the video of it.
> Maybe I misunderstood the presentation at that time.
> 
> 
> Best regards,
> Piotr
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn