[scikit-learn] caching transformers during hyper parameter optimization
joel.nothman at gmail.com
Wed Aug 16 21:15:03 EDT 2017
Now this isn't the best example, because joblib.Memory isn't going to be
very fast at dumping a list of strings, but I hope you can get the idea
On 17 August 2017 at 02:53, Georg Heiler <georg.kf.heiler at gmail.com> wrote:
> Data cleaning @ enrichment
> Could you link an example for a mixing?
> Currently this is a bit if a mess with custom pickle persistence in a big
> for loop and custom transformers
> Joel Nothman <joel.nothman at gmail.com> schrieb am Mi. 16. Aug. 2017 um
>> We certainly considered this over the many years that Pipeline caching
>> has been in the pipeline. Storing the fitted model means we can do both a
>> fit_transform and a transform on new data, and in many cases takes away the
>> pain point of CV over pipelines where downstream steps are varied.
>> What transformer are you using where the transform is costly? Or is it
>> more a matter of you wanting to store the transformed data at each step?
>> There are custom ways to do this sort of thing generically with a mixin
>> if you really want.
>> On 16 August 2017 at 21:28, Georg Heiler <georg.kf.heiler at gmail.com>
>>> There is a new option in the pipeline: http://scikit-learn.
>>> How can I use this to also store the transformed data as I only want to
>>> compute the last step i.e. estimator during hyper parameter tuning and not
>>> the transform methods of the clean steps?
>>> Is there a possibility to apply this for crossvalidation? I would want
>>> to see all the folds precomputed and stored to disk in a folder.
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>> scikit-learn mailing list
>> scikit-learn at python.org
> scikit-learn mailing list
> scikit-learn at python.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the scikit-learn