[scikit-learn] caching transformers during hyper parameter optimization

Joel Nothman joel.nothman at gmail.com
Wed Aug 16 07:51:19 EDT 2017

We certainly considered this over the many years that Pipeline caching has
been in the pipeline. Storing the fitted model means we can do both a
fit_transform and a transform on new data, and in many cases takes away the
pain point of CV over pipelines where downstream steps are varied.

What transformer are you using where the transform is costly? Or is it more
a matter of you wanting to store the transformed data at each step?

There are custom ways to do this sort of thing generically with a mixin if
you really want.

On 16 August 2017 at 21:28, Georg Heiler <georg.kf.heiler at gmail.com> wrote:

> There is a new option in the pipeline: http://scikit-learn.
> org/stable/modules/pipeline.html#pipeline-cache
> How can I use this to also store the transformed data as I only want to
> compute the last step i.e. estimator during hyper parameter tuning and not
> the transform methods of the clean steps?
> Is there a possibility to apply this for crossvalidation? I would want to
> see all the folds precomputed and stored to disk in a folder.
> Regards,
> Georg
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170816/5ff856bb/attachment.html>

More information about the scikit-learn mailing list