[scikit-learn] Any plans on generalizing Pipeline and transformers?

Manuel Castejón Limas manuel.castejon at gmail.com
Tue Dec 19 08:33:42 EST 2017


Wow, that seems promising. I'll read with interest the imbalance-learn code.
Thanks for the info!
Manuel


2017-12-19 14:15 GMT+01:00 Christos Aridas <ichkoar at gmail.com>:

> Hey Manuel,
>
> In imbalanced-learn we have an extra type of estimators, named Samplers,
> which are able to modify X and y, at the same time, with the use of new API
> methods, sample and fit_sample.
> Also, we have adopted a modified version of scikit-learn's Pipeline class
> where we allow subsequent transformations using samplers and transformers.
> Despite the fact that the package deals with imbalanced datasets the
> aforementioned objects may help your pipeline.
>
> Cheerz,
> Chris
>
> On Tue, Dec 19, 2017 at 2:44 PM, Manuel Castejón Limas <
> manuel.castejon at gmail.com> wrote:
>
>> Dear all,
>>
>> Kudos to scikit-learn! Having said that, Pipeline is killing me not being
>> able to transform anything other than X.
>>
>> My current study case would need:
>> - Transformers being able to handle both X and y, e.g. clustering X and y
>> concatenated
>> - Pipeline being able to change other params, e.g. sample_weight
>>
>> Currently, I'm augmenting X through every step with the extra information
>> which seems to work ok for my_pipe.fit_transform(X_train,y_train) but
>> breaks on my_pipe.transform(X_test) for the lack of the y parameter. Ok, I
>> can inherit and modify a descendant from Pipeline class to allow the y
>> parameter which is not ideal but I guess it is an option. The gritty part
>> comes when having to adapt every regressor at the end of the ladder in
>> order to split the extra information from the raw data in X and not being
>> able to generate more than one subproduct from each preprocessing step
>>
>> My current research involves clustering the data and using that
>> classification along with X in order to predict outliers which generates
>> sample_weight info and I would love to use that on the final regressor.
>> Currently there seems not to be another option than pasting that info on X.
>>
>> All in all, I'm stuck with this API limitation and I would love to learn
>> some tricks from you if you could enlighten me.
>>
>> Thanks in advance!
>>
>> Manuel Castejón-Limas
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20171219/3f022295/attachment.html>


More information about the scikit-learn mailing list