[scikit-learn] New Transformer (Guillaume Lema?tre)
David Burns
david.mo.burns at gmail.com
Wed Feb 28 11:46:43 EST 2018
Thanks everyone for your suggested.
I will have a look at PipeGraph - which might be a suitable option for
us as Guillaume suggested.
If it works out, I will share it
Thanks
David
On 02/28/2018 08:29 AM, scikit-learn-request at python.org wrote:
> Send scikit-learn mailing list submissions to
> scikit-learn at python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://mail.python.org/mailman/listinfo/scikit-learn
> or, via email, send a message with subject or body 'help' to
> scikit-learn-request at python.org
>
> You can reach the person managing the list at
> scikit-learn-owner at python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of scikit-learn digest..."
>
>
> Today's Topics:
>
> 1. New Transformer (David Burns)
> 2. Re: New Transformer (Guillaume Lema?tre)
> 3. Re: New Transformer (Manuel Castej?n Limas)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 27 Feb 2018 12:02:27 -0500
> From: David Burns <david.mo.burns at gmail.com>
> To: scikit-learn at python.org
> Subject: [scikit-learn] New Transformer
> Message-ID: <726f2e70-63eb-783f-b470-5ea45af930e5 at gmail.com>
> Content-Type: text/plain; charset="utf-8"; Format="flowed"
>
> First post on this mailing list.
>
> I have been working with time series data for a project, and thought I
> could contribute a new transformer to segment time series data using a
> sliding window, with variable overlap. I have attached demonstration of
> how this would fit in the existing framework. The only challenge for me
> here is that the transformer needs to transform both the X and y
> variable in order to perform the segmentation. I am not sure from the
> documentation how to implement this in the framework.
>
> Overlapping segments is a great way to boost performance for time series
> classifiers, so this may be a worthwhile contribution for some in this
> area of ML. Ultimately, model_selection.TimeSeries.Split would need to
> be modified to support overlapping segments, or a new class created to
> enable validation for this.
>
> Please let me know if this would be a worthwhile contribution, and if so
> how to go about transforming the target vector y in the framework /
> pipeline?
>
> Thanks!
>
> David Burns
>
>
>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: TimeSeriesSegment.py
> Type: text/x-python
> Size: 3336 bytes
> Desc: not available
> URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180227/143ced86/attachment-0001.py>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 27 Feb 2018 19:42:52 +0100
> From: Guillaume Lema?tre <g.lemaitre58 at gmail.com>
> To: Scikit-learn mailing list <scikit-learn at python.org>
> Subject: Re: [scikit-learn] New Transformer
> Message-ID:
> <CACDxx9gy91jwt+XJfgtnUb_5WvMv279dGums6autzFfsnFEJ2g at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Transforming y is a big deal :)
> You can refer to
> https://github.com/scikit-learn/enhancement_proposals/pull/2
> and the associated issues/PR to see what is going on. This is probably an
> additional use case to think about when designing estimator which will be
> modifying y.
>
> Regarding the pipeline, I assume that your strategy would be to resample at
> fit
> and do nothing at predict, isn't it?
>
> NB: you could actually implement this sampling in a FunctionSampler of
> imblearn:
> http://contrib.scikit-learn.org/imbalanced-learn/dev/generated/imblearn.FunctionSampler.html#imblearn.FunctionSampler
> and then use the imblearn pipeline which would apply the transform at fit
> time but not
> at predict.
>
> On 27 February 2018 at 18:02, David Burns <david.mo.burns at gmail.com> wrote:
>
>> First post on this mailing list.
>>
>> I have been working with time series data for a project, and thought I
>> could contribute a new transformer to segment time series data using a
>> sliding window, with variable overlap. I have attached demonstration of how
>> this would fit in the existing framework. The only challenge for me here is
>> that the transformer needs to transform both the X and y variable in order
>> to perform the segmentation. I am not sure from the documentation how to
>> implement this in the framework.
>>
>> Overlapping segments is a great way to boost performance for time series
>> classifiers, so this may be a worthwhile contribution for some in this area
>> of ML. Ultimately, model_selection.TimeSeries.Split would need to be
>> modified to support overlapping segments, or a new class created to enable
>> validation for this.
>>
>> Please let me know if this would be a worthwhile contribution, and if so
>> how to go about transforming the target vector y in the framework /
>> pipeline?
>>
>> Thanks!
>>
>> David Burns
>>
>>
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
More information about the scikit-learn
mailing list