[scikit-learn] Any plans on generalizing Pipeline and transformers?

Tue Dec 19 07:44:54 EST 2017

Dear all,

Kudos to scikit-learn! Having said that, Pipeline is killing me not being
able to transform anything other than X.

My current study case would need:
- Transformers being able to handle both X and y, e.g. clustering X and y
concatenated
- Pipeline being able to change other params, e.g. sample_weight

Currently, I'm augmenting X through every step with the extra information
which seems to work ok for my_pipe.fit_transform(X_train,y_train) but
breaks on my_pipe.transform(X_test) for the lack of the y parameter. Ok, I
can inherit and modify a descendant from Pipeline class to allow the y
parameter which is not ideal but I guess it is an option. The gritty part
comes when having to adapt every regressor at the end of the ladder in
order to split the extra information from the raw data in X and not being
able to generate more than one subproduct from each preprocessing step

My current research involves clustering the data and using that
classification along with X in order to predict outliers which generates
sample_weight info and I would love to use that on the final regressor.
Currently there seems not to be another option than pasting that info on X.

All in all, I'm stuck with this API limitation and I would love to learn
some tricks from you if you could enlighten me.

Thanks in advance!

Manuel Castejón-Limas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20171219/963d5ce3/attachment.html>