[scikit-learn] [GridSearchCV] Reduction of elapsed time at the second interation

Guillaume Lemaître g.lemaitre58 at gmail.com
Wed May 27 11:53:15 EDT 2020


Regarding scikit-learn, the only thing that we cache is the transformer
processing in the pipeline (see the memory parameter in Pipeline).

It seems that you are passing a different set of features at each
iteration. Is the number of features different?

On Sun, 29 Mar 2020 at 19:23, Pedro Cardoso <pedro.cardoso.code at gmail.com>
wrote:

> Hello fellows,
>
> i am knew at slkearn and I have a question about GridSearchCV:
>
> I am running the following code  at a jupyter notebook :
>
> ----------------------*code*-------------------------------
>
> opt_models = dict()
> for feature in [features1, features2, features3, features4]:
>     cmb = CMB(x_train, y_train, x_test, y_test, feature)
>     cmb.fit()
>     cmb.predict()
>     opt_models[str(feature)]=cmb.get_best_model()
>
> -------------------------------------------------------
>
> The CMB class is just a class that contains different classification
> models (SVC, decision tree, etc...). When cmb.fit() is running, a
> gridSearchCV is performed at the SVC model  (which is within the cmb
> instance) in order to tune the hyperparameters C, gamma, and kernel. The
> SCV model is implemented using the sklearn.svm.SVC class. Here is the
> output of the first and second iteration of the for loop:
>
> ---------------------*output*-------------------------------------
> -> 1st iteration
>
>
> Fitting 5 folds for each of 12 candidates, totalling 60 fits
>
> [Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
> [Parallel(n_jobs=-1)]: Done   1 tasks      | elapsed:    6.1s
> [Parallel(n_jobs=-1)]: Done   2 tasks      | elapsed:    6.1s
> [Parallel(n_jobs=-1)]: Done   3 tasks      | elapsed:    6.1s
> [Parallel(n_jobs=-1)]: Done   4 tasks      | elapsed:    6.2s
> [Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    6.2s
> [Parallel(n_jobs=-1)]: Done   6 tasks      | elapsed:    6.2s
> [Parallel(n_jobs=-1)]: Done   7 tasks      | elapsed:    6.2s
> [Parallel(n_jobs=-1)]: Done   8 tasks      | elapsed:    6.2s
> [Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    6.2s
> [Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    6.2s
> [Parallel(n_jobs=-1)]: Done  11 tasks      | elapsed:    6.2s
> [Parallel(n_jobs=-1)]: Done  12 tasks      | elapsed:    6.3s
> [Parallel(n_jobs=-1)]: Done  13 tasks      | elapsed:    6.3s
> [Parallel(n_jobs=-1)]: Done  14 tasks      | elapsed:    6.3s
> [Parallel(n_jobs=-1)]: Done  15 tasks      | elapsed:    6.4s
> [Parallel(n_jobs=-1)]: Done  16 tasks      | elapsed:    6.4s
> [Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:    6.4s
> [Parallel(n_jobs=-1)]: Done  18 tasks      | elapsed:    6.4s
> [Parallel(n_jobs=-1)]: Done  19 tasks      | elapsed:    6.5s
> [Parallel(n_jobs=-1)]: Done  20 tasks      | elapsed:    6.5s
> [Parallel(n_jobs=-1)]: Done  21 tasks      | elapsed:    6.5s
> [Parallel(n_jobs=-1)]: Done  22 tasks      | elapsed:    6.6s
> [Parallel(n_jobs=-1)]: Done  23 tasks      | elapsed:    6.7s
> [Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed:    6.7s
> [Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    6.7s
> [Parallel(n_jobs=-1)]: Done  26 tasks      | elapsed:    6.8s
> [Parallel(n_jobs=-1)]: Done  27 tasks      | elapsed:    6.8s
> [Parallel(n_jobs=-1)]: Done  28 tasks      | elapsed:    6.9s
> [Parallel(n_jobs=-1)]: Done  29 tasks      | elapsed:    6.9s
> [Parallel(n_jobs=-1)]: Done  30 tasks      | elapsed:    6.9s
> [Parallel(n_jobs=-1)]: Done  31 tasks      | elapsed:    7.0s
> [Parallel(n_jobs=-1)]: Done  32 tasks      | elapsed:    7.0s
> [Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:    7.0s
> [Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    7.0s
> [Parallel(n_jobs=-1)]: Done  35 tasks      | elapsed:    7.1s
> [Parallel(n_jobs=-1)]: Done  36 tasks      | elapsed:    7.1s
> [Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:    7.2s
> [Parallel(n_jobs=-1)]: Done  38 tasks      | elapsed:    7.2s
> [Parallel(n_jobs=-1)]: Done  39 tasks      | elapsed:    7.2s
> [Parallel(n_jobs=-1)]: Done  40 tasks      | elapsed:    7.2s
> [Parallel(n_jobs=-1)]: Done  41 tasks      | elapsed:    7.3s
> [Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:    7.3s
> [Parallel(n_jobs=-1)]: Done  43 tasks      | elapsed:    7.3s
> [Parallel(n_jobs=-1)]: Done  44 tasks      | elapsed:    7.4s
> [Parallel(n_jobs=-1)]: Done  45 tasks      | elapsed:    7.4s
> [Parallel(n_jobs=-1)]: Done  46 tasks      | elapsed:    7.5s
>
>
> -> 2nd iteration
>
> Fitting 5 folds for each of 12 candidates, totalling 60 fits
>
> [Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
> [Parallel(n_jobs=-1)]: Done   1 tasks      | elapsed:    0.0s
> [Parallel(n_jobs=-1)]: Batch computation too fast (0.0260s.) Setting batch_size=14.
> [Parallel(n_jobs=-1)]: Done   2 tasks      | elapsed:    0.0s
> [Parallel(n_jobs=-1)]: Done   3 tasks      | elapsed:    0.0s
> [Parallel(n_jobs=-1)]: Done   4 tasks      | elapsed:    0.0s
> [Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    0.0s
> [Parallel(n_jobs=-1)]: Done  60 out of  60 | elapsed:    0.7s finished
>
>
> ---------------------------------------------------------------------------------------------------------------------
>
>
> As you can see, the first iteration gets a elapsed time much larger than
> the 2nd iteration. Does it make sense? I am afraid that the model  is doing
> some kind of cache or shortcut from the 1st iteration, and consequently
> could decrease the model training/performance? I already read the sklearn
> documentation and I didn't saw any warning/note about this kind of
> behaviour.
>
> Thank you very much for your time :)
>
>
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200527/2530fc55/attachment-0001.html>


More information about the scikit-learn mailing list