[scikit-learn] passing y to .transform() in pipeline

Gaël Varoquaux gael.varoquaux at normalesup.org
Sun Jul 16 12:03:17 EDT 2023


The reason to separate fit from transform is to have a operation (transform) that does not use the "y". Indeed, in supervised learning, when not at fit time, the y is not available. 

If your setting is that you never have a situation where y is not available and you want to transform the data using y, you may want to define fit_transform and not define fit or transform. You should then be able to call fit_transform on the Pipeline.

Not however that this is a bit of an unusual pattern in scikit-learn, as it is not a classic setting of machine learning. 

Cheers, 

Gaël

On Jul 14, 2023, 23:07, at 23:07, Florin Andrei <florin at andrei.myip.org> wrote:
>Any chance Pipeline will allow the target (y) to be passed to 
>transformers?
>
>I'm not talking about transforming y, although that would be nice.
>
>I'm just talking about having y passed as an argument to transformers
>in 
>the .transform() call. That would allow me to easily run my own target 
>encoders.
>
>Currently, not having y available during .transform() is very 
>restrictive.
>
>Thanks!
>
>-- 
>Florin Andrei
>https://florin.myip.org/
>_______________________________________________
>scikit-learn mailing list
>scikit-learn at python.org
>https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230716/ce13b1eb/attachment.html>


More information about the scikit-learn mailing list