[scikit-learn] Tuning custom parameters using grid_search

Wed Sep 7 15:16:43 EDT 2016

Hi Sebastian,

thanks a lot. That was exactly what I was looking for! :)
I will have a look into the base classes of other preprocessing steps as 
well.

@Jacob
Thank you too! :)

Greets,
Piotr

On 07.09.2016 20:26, Sebastian Raschka wrote:
> Hi, Piotr,
>
>
>> These preprocessing steps have some parameters too, which I would like to tune.
>> I know that it is possible to tune the parameters of the preprocessing steps,
>> if they are part pf my pipeline.
>> E.g. if I am using PCA, I could tune the parameter n_components, right?
>>
>> But what if I have some "custom" preprocessing code with some parameters?
>> Is it possible to create a scikit-compatible "object" of my custom code in order to tune the
>> parameters in the pipeline with grid search?
> Yeah, you could use the Pipeline class or the `make_pipeline` function, then you can create a custom estimator using the BaseEstimator class like so:
>
>
> class CustomEstimator(BaseEstimator):
>
>      def __init__(self, my_param=None):
>          pass
>
>      def fit_transform(self, X, y=None):
>          return self.fit(X).transform(X)
>
>      def transform(self, X, y=None):
>          return X
>
>      def fit(self, X, y=None):
>          return self
>
>
> pipe = make_pipeline(CustomEstimator(),
>                       LogisticRegression())
> grid = {'customestimator__my_param': [3],
>          'logisticregression__C': [0.1, 1.0, 10.0]}
>
> gsearch1 = GridSearchCV(estimator=pipe, param_grid=grid)
>
> gsearch1.fit(X, y)
>
>
> Then, you can put in your desired preprocessing stuff into fit and transform.
>
> Best,
> Sebastian
>
>> On Sep 7, 2016, at 2:03 PM, Piotr Bialecki <piotr.bialecki at hotmail.de> wrote:
>>
>> Hi all,
>>
>> I am currently tuning some parameters of my xgboost model using scikit's grid_search, e.g.:
>>
>> param_test1 = {'max_depth':range(3,10,2),
>>                             'min_child_weight':range(1,6,2)
>> }
>> gsearch1 = GridSearchCV(estimator = XGBClassifier(learning_rate =0.1, n_estimators=762,
>>                                                                                       max_depth=5, min_child_weight=1, gamma=0,
>>                                                                                       subsample=0.8, colsample_bytree=0.8,
>>                                                                                       objective= 'binary:logistic', nthread=4,
>>                                                                                       scale_pos_weight=1, seed=2809),
>>                                              param_grid = param_test1,
>>                                              scoring='roc_auc',
>>                                              n_jobs=6,
>>                                              iid=False, cv=5)
>>
>> Before that I preprocessed my dataset X with some different methods.
>> These preprocessing steps have some parameters too, which I would like to tune.
>> I know that it is possible to tune the parameters of the preprocessing steps,
>> if they are part pf my pipeline.
>> E.g. if I am using PCA, I could tune the parameter n_components, right?
>>
>> But what if I have some "custom" preprocessing code with some parameters?
>> Is it possible to create a scikit-compatible "object" of my custom code in order to tune the
>> parameters in the pipeline with grid search?
>> Imagine I would like to write a custom method FeatureMultiplier() with a parameter multiplier_value.
>> Is it possible to create a scikit-compatible class out of this method and tune it with grid search?
>>
>> I thought I saw a talk about exactly this topic at some PyData in 2016 or 2015,
>> but unfortunately I cannot find the video of it.
>> Maybe I misunderstood the presentation at that time.
>>
>>
>> Best regards,
>> Piotr
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn