[scikit-learn] [GridSearchCV] Reduction of elapsed time at the second interation

Pedro Cardoso pedro.cardoso.code at gmail.com
Sat May 30 14:34:40 EDT 2020


Hey Guillaume,

first of all, thank you for the help. I checked my code and memory is
turned of (parameter is using default). And yes, I am using a different
number of features  everytime.


Guillaume Lemaître <g.lemaitre58 at gmail.com> escreveu no dia quarta,
27/05/2020 à(s) 16:55:

> Regarding scikit-learn, the only thing that we cache is the transformer
> processing in the pipeline (see the memory parameter in Pipeline).
>
> It seems that you are passing a different set of features at each
> iteration. Is the number of features different?
>
> On Sun, 29 Mar 2020 at 19:23, Pedro Cardoso <pedro.cardoso.code at gmail.com>
> wrote:
>
>> Hello fellows,
>>
>> i am knew at slkearn and I have a question about GridSearchCV:
>>
>> I am running the following code  at a jupyter notebook :
>>
>> ----------------------*code*-------------------------------
>>
>> opt_models = dict()
>> for feature in [features1, features2, features3, features4]:
>>     cmb = CMB(x_train, y_train, x_test, y_test, feature)
>>     cmb.fit()
>>     cmb.predict()
>>     opt_models[str(feature)]=cmb.get_best_model()
>>
>> -------------------------------------------------------
>>
>> The CMB class is just a class that contains different classification
>> models (SVC, decision tree, etc...). When cmb.fit() is running, a
>> gridSearchCV is performed at the SVC model  (which is within the cmb
>> instance) in order to tune the hyperparameters C, gamma, and kernel. The
>> SCV model is implemented using the sklearn.svm.SVC class. Here is the
>> output of the first and second iteration of the for loop:
>>
>> ---------------------*output*-------------------------------------
>> -> 1st iteration
>>
>>
>> Fitting 5 folds for each of 12 candidates, totalling 60 fits
>>
>> [Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
>> [Parallel(n_jobs=-1)]: Done   1 tasks      | elapsed:    6.1s
>> [Parallel(n_jobs=-1)]: Done   2 tasks      | elapsed:    6.1s
>> [Parallel(n_jobs=-1)]: Done   3 tasks      | elapsed:    6.1s
>> [Parallel(n_jobs=-1)]: Done   4 tasks      | elapsed:    6.2s
>> [Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    6.2s
>> [Parallel(n_jobs=-1)]: Done   6 tasks      | elapsed:    6.2s
>> [Parallel(n_jobs=-1)]: Done   7 tasks      | elapsed:    6.2s
>> [Parallel(n_jobs=-1)]: Done   8 tasks      | elapsed:    6.2s
>> [Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    6.2s
>> [Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    6.2s
>> [Parallel(n_jobs=-1)]: Done  11 tasks      | elapsed:    6.2s
>> [Parallel(n_jobs=-1)]: Done  12 tasks      | elapsed:    6.3s
>> [Parallel(n_jobs=-1)]: Done  13 tasks      | elapsed:    6.3s
>> [Parallel(n_jobs=-1)]: Done  14 tasks      | elapsed:    6.3s
>> [Parallel(n_jobs=-1)]: Done  15 tasks      | elapsed:    6.4s
>> [Parallel(n_jobs=-1)]: Done  16 tasks      | elapsed:    6.4s
>> [Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:    6.4s
>> [Parallel(n_jobs=-1)]: Done  18 tasks      | elapsed:    6.4s
>> [Parallel(n_jobs=-1)]: Done  19 tasks      | elapsed:    6.5s
>> [Parallel(n_jobs=-1)]: Done  20 tasks      | elapsed:    6.5s
>> [Parallel(n_jobs=-1)]: Done  21 tasks      | elapsed:    6.5s
>> [Parallel(n_jobs=-1)]: Done  22 tasks      | elapsed:    6.6s
>> [Parallel(n_jobs=-1)]: Done  23 tasks      | elapsed:    6.7s
>> [Parallel(n_jobs=-1)]: Done  24 tasks      | elapsed:    6.7s
>> [Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    6.7s
>> [Parallel(n_jobs=-1)]: Done  26 tasks      | elapsed:    6.8s
>> [Parallel(n_jobs=-1)]: Done  27 tasks      | elapsed:    6.8s
>> [Parallel(n_jobs=-1)]: Done  28 tasks      | elapsed:    6.9s
>> [Parallel(n_jobs=-1)]: Done  29 tasks      | elapsed:    6.9s
>> [Parallel(n_jobs=-1)]: Done  30 tasks      | elapsed:    6.9s
>> [Parallel(n_jobs=-1)]: Done  31 tasks      | elapsed:    7.0s
>> [Parallel(n_jobs=-1)]: Done  32 tasks      | elapsed:    7.0s
>> [Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:    7.0s
>> [Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    7.0s
>> [Parallel(n_jobs=-1)]: Done  35 tasks      | elapsed:    7.1s
>> [Parallel(n_jobs=-1)]: Done  36 tasks      | elapsed:    7.1s
>> [Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:    7.2s
>> [Parallel(n_jobs=-1)]: Done  38 tasks      | elapsed:    7.2s
>> [Parallel(n_jobs=-1)]: Done  39 tasks      | elapsed:    7.2s
>> [Parallel(n_jobs=-1)]: Done  40 tasks      | elapsed:    7.2s
>> [Parallel(n_jobs=-1)]: Done  41 tasks      | elapsed:    7.3s
>> [Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:    7.3s
>> [Parallel(n_jobs=-1)]: Done  43 tasks      | elapsed:    7.3s
>> [Parallel(n_jobs=-1)]: Done  44 tasks      | elapsed:    7.4s
>> [Parallel(n_jobs=-1)]: Done  45 tasks      | elapsed:    7.4s
>> [Parallel(n_jobs=-1)]: Done  46 tasks      | elapsed:    7.5s
>>
>>
>> -> 2nd iteration
>>
>> Fitting 5 folds for each of 12 candidates, totalling 60 fits
>>
>> [Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
>> [Parallel(n_jobs=-1)]: Done   1 tasks      | elapsed:    0.0s
>> [Parallel(n_jobs=-1)]: Batch computation too fast (0.0260s.) Setting batch_size=14.
>> [Parallel(n_jobs=-1)]: Done   2 tasks      | elapsed:    0.0s
>> [Parallel(n_jobs=-1)]: Done   3 tasks      | elapsed:    0.0s
>> [Parallel(n_jobs=-1)]: Done   4 tasks      | elapsed:    0.0s
>> [Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    0.0s
>> [Parallel(n_jobs=-1)]: Done  60 out of  60 | elapsed:    0.7s finished
>>
>>
>> ---------------------------------------------------------------------------------------------------------------------
>>
>>
>> As you can see, the first iteration gets a elapsed time much larger than
>> the 2nd iteration. Does it make sense? I am afraid that the model  is doing
>> some kind of cache or shortcut from the 1st iteration, and consequently
>> could decrease the model training/performance? I already read the sklearn
>> documentation and I didn't saw any warning/note about this kind of
>> behaviour.
>>
>> Thank you very much for your time :)
>>
>>
>>
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
> --
> Guillaume Lemaitre
> Scikit-learn @ Inria Foundation
> https://glemaitre.github.io/
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200530/88562592/attachment.html>


More information about the scikit-learn mailing list