<div dir="ltr">Now this isn't the best example, because joblib.Memory isn't going to be very fast at dumping a list of strings, but I hope you can get the idea from <a href="https://gist.github.com/jnothman/019d594d197c98a3d6192fa0cb19c850">https://gist.github.com/jnothman/019d594d197c98a3d6192fa0cb19c850</a><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On 17 August 2017 at 02:53, Georg Heiler <span dir="ltr"><<a href="mailto:georg.kf.heiler@gmail.com" target="_blank">georg.kf.heiler@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Data cleaning @ enrichment <br><br>Could you link an example for a mixing?<br><br>Currently this is a bit if a mess with custom pickle persistence in a big for loop and custom transformers <br><br>Thanks. <br><span class="HOEnZb"><font color="#888888">Georg <br></font></span><div class="HOEnZb"><div class="h5"><div class="gmail_quote"><div dir="ltr">Joel Nothman <<a href="mailto:joel.nothman@gmail.com" target="_blank">joel.nothman@gmail.com</a>> schrieb am Mi. 16. Aug. 2017 um 13:51:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">We certainly considered this over the many years that Pipeline caching has been in the pipeline. Storing the fitted model means we can do both a fit_transform and a transform on new data, and in many cases takes away the pain point of CV over pipelines where downstream steps are varied.<div><br></div><div>What transformer are you using where the transform is costly? Or is it more a matter of you wanting to store the transformed data at each step?</div><div><br></div><div>There are custom ways to do this sort of thing generically with a mixin if you really want.</div></div><div class="gmail_extra"><br><div class="gmail_quote"></div></div><div class="gmail_extra"><div class="gmail_quote">On 16 August 2017 at 21:28, Georg Heiler <span dir="ltr"><<a href="mailto:georg.kf.heiler@gmail.com" target="_blank">georg.kf.heiler@gmail.com</a>></span> wrote:<br></div></div><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">There is a new option in the pipeline: <a href="http://scikit-learn.org/stable/modules/pipeline.html#pipeline-cache" target="_blank">http://scikit-learn.<wbr>org/stable/modules/pipeline.<wbr>html#pipeline-cache</a> <div>How can I use this to also store the transformed data as I only want to compute the last step i.e. estimator during hyper parameter tuning and not the transform methods of the clean steps?</div><div><br></div><div>Is there a possibility to apply this for crossvalidation? I would want to see all the folds precomputed and stored to disk in a folder.</div><div><br></div><div>Regards,</div><div>Georg</div></div>
<br></blockquote></div></div><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">______________________________<wbr>_________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
<br></blockquote></div><br></div>
______________________________<wbr>_________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
</blockquote></div>
</div></div><br>______________________________<wbr>_________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
<br></blockquote></div><br></div>