[scikit-learn] [GridSearchCV] Reduction of elapsed time at the second interation
Guillaume Lemaître
g.lemaitre58 at gmail.com
Wed May 27 11:53:15 EDT 2020
Regarding scikit-learn, the only thing that we cache is the transformer
processing in the pipeline (see the memory parameter in Pipeline).
It seems that you are passing a different set of features at each
iteration. Is the number of features different?
On Sun, 29 Mar 2020 at 19:23, Pedro Cardoso <pedro.cardoso.code at gmail.com>
wrote:
> Hello fellows,
>
> i am knew at slkearn and I have a question about GridSearchCV:
>
> I am running the following code at a jupyter notebook :
>
> ----------------------*code*-------------------------------
>
> opt_models = dict()
> for feature in [features1, features2, features3, features4]:
> cmb = CMB(x_train, y_train, x_test, y_test, feature)
> cmb.fit()
> cmb.predict()
> opt_models[str(feature)]=cmb.get_best_model()
>
> -------------------------------------------------------
>
> The CMB class is just a class that contains different classification
> models (SVC, decision tree, etc...). When cmb.fit() is running, a
> gridSearchCV is performed at the SVC model (which is within the cmb
> instance) in order to tune the hyperparameters C, gamma, and kernel. The
> SCV model is implemented using the sklearn.svm.SVC class. Here is the
> output of the first and second iteration of the for loop:
>
> ---------------------*output*-------------------------------------
> -> 1st iteration
>
>
> Fitting 5 folds for each of 12 candidates, totalling 60 fits
>
> [Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
> [Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 6.1s
> [Parallel(n_jobs=-1)]: Done 2 tasks | elapsed: 6.1s
> [Parallel(n_jobs=-1)]: Done 3 tasks | elapsed: 6.1s
> [Parallel(n_jobs=-1)]: Done 4 tasks | elapsed: 6.2s
> [Parallel(n_jobs=-1)]: Done 5 tasks | elapsed: 6.2s
> [Parallel(n_jobs=-1)]: Done 6 tasks | elapsed: 6.2s
> [Parallel(n_jobs=-1)]: Done 7 tasks | elapsed: 6.2s
> [Parallel(n_jobs=-1)]: Done 8 tasks | elapsed: 6.2s
> [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 6.2s
> [Parallel(n_jobs=-1)]: Done 10 tasks | elapsed: 6.2s
> [Parallel(n_jobs=-1)]: Done 11 tasks | elapsed: 6.2s
> [Parallel(n_jobs=-1)]: Done 12 tasks | elapsed: 6.3s
> [Parallel(n_jobs=-1)]: Done 13 tasks | elapsed: 6.3s
> [Parallel(n_jobs=-1)]: Done 14 tasks | elapsed: 6.3s
> [Parallel(n_jobs=-1)]: Done 15 tasks | elapsed: 6.4s
> [Parallel(n_jobs=-1)]: Done 16 tasks | elapsed: 6.4s
> [Parallel(n_jobs=-1)]: Done 17 tasks | elapsed: 6.4s
> [Parallel(n_jobs=-1)]: Done 18 tasks | elapsed: 6.4s
> [Parallel(n_jobs=-1)]: Done 19 tasks | elapsed: 6.5s
> [Parallel(n_jobs=-1)]: Done 20 tasks | elapsed: 6.5s
> [Parallel(n_jobs=-1)]: Done 21 tasks | elapsed: 6.5s
> [Parallel(n_jobs=-1)]: Done 22 tasks | elapsed: 6.6s
> [Parallel(n_jobs=-1)]: Done 23 tasks | elapsed: 6.7s
> [Parallel(n_jobs=-1)]: Done 24 tasks | elapsed: 6.7s
> [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 6.7s
> [Parallel(n_jobs=-1)]: Done 26 tasks | elapsed: 6.8s
> [Parallel(n_jobs=-1)]: Done 27 tasks | elapsed: 6.8s
> [Parallel(n_jobs=-1)]: Done 28 tasks | elapsed: 6.9s
> [Parallel(n_jobs=-1)]: Done 29 tasks | elapsed: 6.9s
> [Parallel(n_jobs=-1)]: Done 30 tasks | elapsed: 6.9s
> [Parallel(n_jobs=-1)]: Done 31 tasks | elapsed: 7.0s
> [Parallel(n_jobs=-1)]: Done 32 tasks | elapsed: 7.0s
> [Parallel(n_jobs=-1)]: Done 33 tasks | elapsed: 7.0s
> [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 7.0s
> [Parallel(n_jobs=-1)]: Done 35 tasks | elapsed: 7.1s
> [Parallel(n_jobs=-1)]: Done 36 tasks | elapsed: 7.1s
> [Parallel(n_jobs=-1)]: Done 37 tasks | elapsed: 7.2s
> [Parallel(n_jobs=-1)]: Done 38 tasks | elapsed: 7.2s
> [Parallel(n_jobs=-1)]: Done 39 tasks | elapsed: 7.2s
> [Parallel(n_jobs=-1)]: Done 40 tasks | elapsed: 7.2s
> [Parallel(n_jobs=-1)]: Done 41 tasks | elapsed: 7.3s
> [Parallel(n_jobs=-1)]: Done 42 tasks | elapsed: 7.3s
> [Parallel(n_jobs=-1)]: Done 43 tasks | elapsed: 7.3s
> [Parallel(n_jobs=-1)]: Done 44 tasks | elapsed: 7.4s
> [Parallel(n_jobs=-1)]: Done 45 tasks | elapsed: 7.4s
> [Parallel(n_jobs=-1)]: Done 46 tasks | elapsed: 7.5s
>
>
> -> 2nd iteration
>
> Fitting 5 folds for each of 12 candidates, totalling 60 fits
>
> [Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
> [Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 0.0s
> [Parallel(n_jobs=-1)]: Batch computation too fast (0.0260s.) Setting batch_size=14.
> [Parallel(n_jobs=-1)]: Done 2 tasks | elapsed: 0.0s
> [Parallel(n_jobs=-1)]: Done 3 tasks | elapsed: 0.0s
> [Parallel(n_jobs=-1)]: Done 4 tasks | elapsed: 0.0s
> [Parallel(n_jobs=-1)]: Done 5 tasks | elapsed: 0.0s
> [Parallel(n_jobs=-1)]: Done 60 out of 60 | elapsed: 0.7s finished
>
>
> ---------------------------------------------------------------------------------------------------------------------
>
>
> As you can see, the first iteration gets a elapsed time much larger than
> the 2nd iteration. Does it make sense? I am afraid that the model is doing
> some kind of cache or shortcut from the 1st iteration, and consequently
> could decrease the model training/performance? I already read the sklearn
> documentation and I didn't saw any warning/note about this kind of
> behaviour.
>
> Thank you very much for your time :)
>
>
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
--
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200527/2530fc55/attachment-0001.html>
More information about the scikit-learn
mailing list