[scikit-learn] How to not recalculate transformer in a Pipeline?

Anton Suchaneck a.suchaneck at gmail.com
Mon Nov 28 10:24:02 EST 2016


I use a 2-step Pipeline with an expensive transformer and a classification
afterwards. On this I do GridSearchCV of the classifcation parameters.

Now, theoretically GridSearchCV could know that I'm not touching any
parameters of the transformer and avoid re-doing work by keeping the
transformed X, right?!
Currently, GridSearchCV will do a clean re-run of all Pipeline steps?

Can you recommend the easiest way for me to use GridSearchCV+Pipeline while
avoiding recomputation of all transformer steps whose parameters are not in
the GridSearch? I realize this may be tricky, but any pointers to realize
this most conveniently and compatible with sklearn would be highly

(The scoring has to be done on the initial data, so I cannot just manually
transform beforehand.)


PS: If that all makes sense, is that a useful feature to include in sklearn?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161128/f3d4c147/attachment.html>

More information about the scikit-learn mailing list