[scikit-learn] [GridSearchCV] Reduction of elapsed time at the second interation
Pedro Cardoso
pedro.cardoso.code at gmail.com
Sat May 30 14:34:40 EDT 2020
Hey Guillaume,
first of all, thank you for the help. I checked my code and memory is
turned of (parameter is using default). And yes, I am using a different
number of features everytime.
Guillaume Lemaître <g.lemaitre58 at gmail.com> escreveu no dia quarta,
27/05/2020 à(s) 16:55:
> Regarding scikit-learn, the only thing that we cache is the transformer
> processing in the pipeline (see the memory parameter in Pipeline).
>
> It seems that you are passing a different set of features at each
> iteration. Is the number of features different?
>
> On Sun, 29 Mar 2020 at 19:23, Pedro Cardoso <pedro.cardoso.code at gmail.com>
> wrote:
>
>> Hello fellows,
>>
>> i am knew at slkearn and I have a question about GridSearchCV:
>>
>> I am running the following code at a jupyter notebook :
>>
>> ----------------------*code*-------------------------------
>>
>> opt_models = dict()
>> for feature in [features1, features2, features3, features4]:
>> cmb = CMB(x_train, y_train, x_test, y_test, feature)
>> cmb.fit()
>> cmb.predict()
>> opt_models[str(feature)]=cmb.get_best_model()
>>
>> -------------------------------------------------------
>>
>> The CMB class is just a class that contains different classification
>> models (SVC, decision tree, etc...). When cmb.fit() is running, a
>> gridSearchCV is performed at the SVC model (which is within the cmb
>> instance) in order to tune the hyperparameters C, gamma, and kernel. The
>> SCV model is implemented using the sklearn.svm.SVC class. Here is the
>> output of the first and second iteration of the for loop:
>>
>> ---------------------*output*-------------------------------------
>> -> 1st iteration
>>
>>
>> Fitting 5 folds for each of 12 candidates, totalling 60 fits
>>
>> [Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
>> [Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 6.1s
>> [Parallel(n_jobs=-1)]: Done 2 tasks | elapsed: 6.1s
>> [Parallel(n_jobs=-1)]: Done 3 tasks | elapsed: 6.1s
>> [Parallel(n_jobs=-1)]: Done 4 tasks | elapsed: 6.2s
>> [Parallel(n_jobs=-1)]: Done 5 tasks | elapsed: 6.2s
>> [Parallel(n_jobs=-1)]: Done 6 tasks | elapsed: 6.2s
>> [Parallel(n_jobs=-1)]: Done 7 tasks | elapsed: 6.2s
>> [Parallel(n_jobs=-1)]: Done 8 tasks | elapsed: 6.2s
>> [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 6.2s
>> [Parallel(n_jobs=-1)]: Done 10 tasks | elapsed: 6.2s
>> [Parallel(n_jobs=-1)]: Done 11 tasks | elapsed: 6.2s
>> [Parallel(n_jobs=-1)]: Done 12 tasks | elapsed: 6.3s
>> [Parallel(n_jobs=-1)]: Done 13 tasks | elapsed: 6.3s
>> [Parallel(n_jobs=-1)]: Done 14 tasks | elapsed: 6.3s
>> [Parallel(n_jobs=-1)]: Done 15 tasks | elapsed: 6.4s
>> [Parallel(n_jobs=-1)]: Done 16 tasks | elapsed: 6.4s
>> [Parallel(n_jobs=-1)]: Done 17 tasks | elapsed: 6.4s
>> [Parallel(n_jobs=-1)]: Done 18 tasks | elapsed: 6.4s
>> [Parallel(n_jobs=-1)]: Done 19 tasks | elapsed: 6.5s
>> [Parallel(n_jobs=-1)]: Done 20 tasks | elapsed: 6.5s
>> [Parallel(n_jobs=-1)]: Done 21 tasks | elapsed: 6.5s
>> [Parallel(n_jobs=-1)]: Done 22 tasks | elapsed: 6.6s
>> [Parallel(n_jobs=-1)]: Done 23 tasks | elapsed: 6.7s
>> [Parallel(n_jobs=-1)]: Done 24 tasks | elapsed: 6.7s
>> [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 6.7s
>> [Parallel(n_jobs=-1)]: Done 26 tasks | elapsed: 6.8s
>> [Parallel(n_jobs=-1)]: Done 27 tasks | elapsed: 6.8s
>> [Parallel(n_jobs=-1)]: Done 28 tasks | elapsed: 6.9s
>> [Parallel(n_jobs=-1)]: Done 29 tasks | elapsed: 6.9s
>> [Parallel(n_jobs=-1)]: Done 30 tasks | elapsed: 6.9s
>> [Parallel(n_jobs=-1)]: Done 31 tasks | elapsed: 7.0s
>> [Parallel(n_jobs=-1)]: Done 32 tasks | elapsed: 7.0s
>> [Parallel(n_jobs=-1)]: Done 33 tasks | elapsed: 7.0s
>> [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 7.0s
>> [Parallel(n_jobs=-1)]: Done 35 tasks | elapsed: 7.1s
>> [Parallel(n_jobs=-1)]: Done 36 tasks | elapsed: 7.1s
>> [Parallel(n_jobs=-1)]: Done 37 tasks | elapsed: 7.2s
>> [Parallel(n_jobs=-1)]: Done 38 tasks | elapsed: 7.2s
>> [Parallel(n_jobs=-1)]: Done 39 tasks | elapsed: 7.2s
>> [Parallel(n_jobs=-1)]: Done 40 tasks | elapsed: 7.2s
>> [Parallel(n_jobs=-1)]: Done 41 tasks | elapsed: 7.3s
>> [Parallel(n_jobs=-1)]: Done 42 tasks | elapsed: 7.3s
>> [Parallel(n_jobs=-1)]: Done 43 tasks | elapsed: 7.3s
>> [Parallel(n_jobs=-1)]: Done 44 tasks | elapsed: 7.4s
>> [Parallel(n_jobs=-1)]: Done 45 tasks | elapsed: 7.4s
>> [Parallel(n_jobs=-1)]: Done 46 tasks | elapsed: 7.5s
>>
>>
>> -> 2nd iteration
>>
>> Fitting 5 folds for each of 12 candidates, totalling 60 fits
>>
>> [Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
>> [Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 0.0s
>> [Parallel(n_jobs=-1)]: Batch computation too fast (0.0260s.) Setting batch_size=14.
>> [Parallel(n_jobs=-1)]: Done 2 tasks | elapsed: 0.0s
>> [Parallel(n_jobs=-1)]: Done 3 tasks | elapsed: 0.0s
>> [Parallel(n_jobs=-1)]: Done 4 tasks | elapsed: 0.0s
>> [Parallel(n_jobs=-1)]: Done 5 tasks | elapsed: 0.0s
>> [Parallel(n_jobs=-1)]: Done 60 out of 60 | elapsed: 0.7s finished
>>
>>
>> ---------------------------------------------------------------------------------------------------------------------
>>
>>
>> As you can see, the first iteration gets a elapsed time much larger than
>> the 2nd iteration. Does it make sense? I am afraid that the model is doing
>> some kind of cache or shortcut from the 1st iteration, and consequently
>> could decrease the model training/performance? I already read the sklearn
>> documentation and I didn't saw any warning/note about this kind of
>> behaviour.
>>
>> Thank you very much for your time :)
>>
>>
>>
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
> --
> Guillaume Lemaitre
> Scikit-learn @ Inria Foundation
> https://glemaitre.github.io/
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200530/88562592/attachment.html>
More information about the scikit-learn
mailing list