First post on this mailing list. I have been working with time series data for a project, and thought I could contribute a new transformer to segment time series data using a sliding window, with variable overlap. I have attached demonstration of how this would fit in the existing framework. The only challenge for me here is that the transformer needs to transform both the X and y variable in order to perform the segmentation. I am not sure from the documentation how to implement this in the framework. Overlapping segments is a great way to boost performance for time series classifiers, so this may be a worthwhile contribution for some in this area of ML. Ultimately, model_selection.TimeSeries.Split would need to be modified to support overlapping segments, or a new class created to enable validation for this. Please let me know if this would be a worthwhile contribution, and if so how to go about transforming the target vector y in the framework / pipeline? Thanks! David Burns
Transforming y is a big deal :) You can refer to https://github.com/scikit-learn/enhancement_proposals/pull/2 and the associated issues/PR to see what is going on. This is probably an additional use case to think about when designing estimator which will be modifying y. Regarding the pipeline, I assume that your strategy would be to resample at fit and do nothing at predict, isn't it? NB: you could actually implement this sampling in a FunctionSampler of imblearn: http://contrib.scikit-learn.org/imbalanced-learn/dev/generated/imblearn.Func... and then use the imblearn pipeline which would apply the transform at fit time but not at predict. On 27 February 2018 at 18:02, David Burns <david.mo.burns@gmail.com> wrote:
First post on this mailing list.
I have been working with time series data for a project, and thought I could contribute a new transformer to segment time series data using a sliding window, with variable overlap. I have attached demonstration of how this would fit in the existing framework. The only challenge for me here is that the transformer needs to transform both the X and y variable in order to perform the segmentation. I am not sure from the documentation how to implement this in the framework.
Overlapping segments is a great way to boost performance for time series classifiers, so this may be a worthwhile contribution for some in this area of ML. Ultimately, model_selection.TimeSeries.Split would need to be modified to support overlapping segments, or a new class created to enable validation for this.
Please let me know if this would be a worthwhile contribution, and if so how to go about transforming the target vector y in the framework / pipeline?
Thanks!
David Burns
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/
Dear David, We recently submitted PipeGraph as a sklearn contrib project. Even though it is an ongoing project and we are right now modifying the interface in order to make it more suitable and useful for the sklearn community, I believe that the problems that you explain can be addressed by PipeGraph. If you need the possibility of defining different/equal transformations for X and y you can do it by simply defining different steps for each path; if you need different paths for fit and predict it is also possible to define them in PipeGraph. Please have a look at the general examples and judge by yourself if it fits your needs: https://mcasl.github.io/PipeGraph/auto_examples/plot_4_example_combination_o... You can play with it using pip, for example: pip install pipegraph The API can be considered far from stable and we are following the advice of the sklearn community to turn it into something as useful as possible, but it is my humble opinion that in situations like this PipeGraph can provide a suitable solution. Best Manolo Best regards 2018-02-27 19:42 GMT+01:00 Guillaume Lemaître <g.lemaitre58@gmail.com>:
Transforming y is a big deal :) You can refer to https://github.com/scikit-learn/enhancement_proposals/ pull/2 and the associated issues/PR to see what is going on. This is probably an additional use case to think about when designing estimator which will be modifying y.
Regarding the pipeline, I assume that your strategy would be to resample at fit and do nothing at predict, isn't it?
NB: you could actually implement this sampling in a FunctionSampler of imblearn: http://contrib.scikit-learn.org/imbalanced-learn/dev/generated/imblearn. FunctionSampler.html#imblearn.FunctionSampler and then use the imblearn pipeline which would apply the transform at fit time but not at predict.
On 27 February 2018 at 18:02, David Burns <david.mo.burns@gmail.com> wrote:
First post on this mailing list.
I have been working with time series data for a project, and thought I could contribute a new transformer to segment time series data using a sliding window, with variable overlap. I have attached demonstration of how this would fit in the existing framework. The only challenge for me here is that the transformer needs to transform both the X and y variable in order to perform the segmentation. I am not sure from the documentation how to implement this in the framework.
Overlapping segments is a great way to boost performance for time series classifiers, so this may be a worthwhile contribution for some in this area of ML. Ultimately, model_selection.TimeSeries.Split would need to be modified to support overlapping segments, or a new class created to enable validation for this.
Please let me know if this would be a worthwhile contribution, and if so how to go about transforming the target vector y in the framework / pipeline?
Thanks!
David Burns
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
participants (3)
-
David Burns -
Guillaume Lemaître -
Manuel Castejón Limas