[scikit-learn] Contribution to sklearn: Cross validation of time series
Sylvain Marchienne
sylvain.marchienne at gmail.com
Fri Apr 28 12:13:05 EDT 2017
Hi Andres, hi Andy,
Indeed in real life I also needed to cross-validate time series in a different manner than TimeSeriesSplit implemented in sklearn does.
I fully support the idea of such a contribution Andres.
As Andy mentioned, the main option would be a « rolling window » or as I use to say, a « sliding window » technique.
I think this is what you meant. In order to understand each other, I propose to give a piece of explanation:
Think about your data sorted by time chronologically on an axis.
Set a constant test set length (interval) which will « slide » over the time.
Then the training set is just the rest of the data before the first one in test set.
I joined a slide I used during a presentation of that principle.
Andy, probably it wasn’t your exact idea but I think it’s kind of.
Thanks,
Sylvain
> Le 28 avr. 2017 à 17:48, Andreas Mueller <t3kcit at gmail.com> a écrit :
>
> Hey Andres.
> I think there might be a PR for that.
> Can you explain the minimum size of the training set? How is that used?
> I thought the other main option would be "rolling window" cross validation
> to use a fixed length cv training set.
>
> So the two options to me were rolling window and what we're doing right now.
> Can you elaborate on the other use cases, like minimum size of the training set
> and why you would want the other options with a variable length training set?
>
> Thanks,
> Andy
>
> On 04/27/2017 09:44 AM, andres lago wrote:
>> Hello,
>> I'd like to contribute with a new functionality in sklearn. It's the cross validation of time series. It's an evolution of the current functionality, implemented by TimeSeriesSplit.
>>
>> TimeSeriesSplit only allows the user to set the number of folds. In real life, when performing the cross validation of time series, other parameters are required, for instance:
>> -minimum size of CV-training set
>> -size of CV-test set
>> -fixed or variable length of CV-training set.
>>
>> The functionality is inspired by the R library 'caret'.
>>
>> If you agree, I can share my code. I developed it for a project with the french rail company SNCF. It's in production now.
>>
>> Regards,
>> Andres
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org <mailto:scikit-learn at python.org>
>> https://mail.python.org/mailman/listinfo/scikit-learn <https://mail.python.org/mailman/listinfo/scikit-learn>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org <mailto:scikit-learn at python.org>
> https://mail.python.org/mailman/listinfo/scikit-learn <https://mail.python.org/mailman/listinfo/scikit-learn>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170428/b5809a81/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Sliding_window.jpg
Type: image/jpeg
Size: 32063 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170428/b5809a81/attachment-0001.jpg>
More information about the scikit-learn
mailing list