[scikit-learn] New Transformer

Manuel Castejón Limas manuel.castejon at gmail.com
Wed Feb 28 08:28:20 EST 2018

Dear David,

We recently submitted PipeGraph as a sklearn contrib project. Even though
it is an ongoing project and we are right now modifying the interface in
order to make it more suitable and useful for the sklearn community, I
believe that the problems that you explain can be addressed by PipeGraph.
If you need the possibility of defining different/equal transformations for
X and y you can do it by simply defining different steps for each path;
if you need different paths for fit and predict it is also possible to
define them in PipeGraph.
Please have a look at the general examples and judge by yourself if it fits
your needs:


You can play with it using pip, for example:

pip install pipegraph

The API can be considered far from stable and we are following the advice
of the sklearn community to turn it into something as useful as possible,
but it is my humble opinion that in situations like this PipeGraph can
provide a suitable solution.

Best regards

2018-02-27 19:42 GMT+01:00 Guillaume Lemaître <g.lemaitre58 at gmail.com>:

> Transforming y is a big deal :)
> You can refer to https://github.com/scikit-learn/enhancement_proposals/
> pull/2
> and the associated issues/PR to see what is going on. This is probably an
> additional use case to think about when designing estimator which will be
> modifying y.
> Regarding the pipeline, I assume that your strategy would be to resample
> at fit
> and do nothing at predict, isn't it?
> NB: you could actually implement this sampling in a FunctionSampler of
> imblearn:
> http://contrib.scikit-learn.org/imbalanced-learn/dev/generated/imblearn.
> FunctionSampler.html#imblearn.FunctionSampler
> and then use the imblearn pipeline which would apply the transform at fit
> time but not
> at predict.
> On 27 February 2018 at 18:02, David Burns <david.mo.burns at gmail.com>
> wrote:
>> First post on this mailing list.
>> I have been working with time series data for a project, and thought I
>> could contribute a new transformer to segment time series data using a
>> sliding window, with variable overlap. I have attached demonstration of how
>> this would fit in the existing framework. The only challenge for me here is
>> that the transformer needs to transform both the X and y variable in order
>> to perform the segmentation. I am not sure from the documentation how to
>> implement this in the framework.
>> Overlapping segments is a great way to boost performance for time series
>> classifiers, so this may be a worthwhile contribution for some in this area
>> of ML. Ultimately, model_selection.TimeSeries.Split would need to be
>> modified to support overlapping segments, or a new class created to enable
>> validation for this.
>> Please let me know if this would be a worthwhile contribution, and if so
>> how to go about transforming the target vector y in the framework /
>> pipeline?
>> Thanks!
>> David Burns
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
> --
> Guillaume Lemaitre
> INRIA Saclay - Parietal team
> Center for Data Science Paris-Saclay
> https://glemaitre.github.io/
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180228/b2915acf/attachment-0001.html>

More information about the scikit-learn mailing list