[scikit-learn] How to not recalculate transformer in a Pipeline?

Joel Nothman joel.nothman at gmail.com
Mon Nov 28 18:13:00 EST 2016


A few brief points of history:


   - We have had PRs #3951
   <https://github.com/scikit-learn/scikit-learn/pull/3951> and #2086
   <https://github.com/scikit-learn/scikit-learn/pull/2086> that build
   memoising into Pipeline in one way or another.
   - Andy and I have previously discussed alternative ways to set
   parameters to avoid indirection issues created by wrappers. This can be
   achieved by setting the parameter space on the estimator itself, or by
   indicating parameters to *SearchCV shallowly with respect to an estimator
   instance, rather than using an indirected path. See #5082
   <https://github.com/scikit-learn/scikit-learn/issues/5082>.
   - The indirection is in parameter setting as well as in retrieving model
   attributes. My remember branch
   <https://github.com/jnothman/scikit-learn/commit/76cace9f104a575116492bea1a23e12e5e168789>
   gets around both indirections in creating a remember_transform wrapper, but
   it does so by hacking clone (as per #5080
   <https://github.com/scikit-learn/scikit-learn/issues/5080>), and doing
   some other magic.


On 29 November 2016 at 09:17, Gael Varoquaux <gael.varoquaux at normalesup.org>
wrote:

> Actually, thinking a bit about this, the inconvenience with the pattern
> that I lay out below is that it adds an extra indirection in the
> parameter setting. One way to avoid this would be to have a subclass of
> the pipeline that includes memoizing. It would call a memoized version of
> fit.
>
> I think that it would be quite handy :).
>
> Should I open an issue on that?
>
> G
>
> On Mon, Nov 28, 2016 at 07:51:21PM +0100, Gael Varoquaux wrote:
> > On Mon, Nov 28, 2016 at 01:46:08PM -0500, Andreas Mueller wrote:
> > > I guess so. You'd handle parameters using an estimator_params dict in
> init
> > > and pass that to the caching function?
>
> > I'd try to set on the estimator, before passing them to the function, as
> we
> > do in standard scikit-learn, and joblib is clever enough to take that in
> > account when given the estimator as a function of the function that is
> > memoized.
>
> > G
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
> --
>     Gael Varoquaux
>     Researcher, INRIA Parietal
>     NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
>     Phone:  ++ 33-1-69-08-79-68
>     http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161129/8c413b48/attachment-0001.html>


More information about the scikit-learn mailing list