[scikit-learn] Contribution to sklearn: Cross validation of time series

Sylvain Marchienne sylvain.marchienne at gmail.com
Fri Apr 28 12:13:05 EDT 2017


Hi Andres, hi Andy,

Indeed in real life I also needed to cross-validate time series in a different manner than TimeSeriesSplit implemented in sklearn does.
I fully support the idea of such a contribution Andres.

As Andy mentioned, the main option would be a « rolling window » or as I use to say, a « sliding window » technique.
I think this is what you meant. In order to understand each other, I propose to give a piece of explanation:

Think about your data sorted by time chronologically on an axis.
Set a constant test set length (interval) which will « slide » over the time. 
Then the training set is just the rest of the data before the first one in test set.

I joined a slide I used during a presentation of that principle.
Andy, probably it wasn’t your exact idea but I think it’s kind of.

Thanks,
Sylvain




> Le 28 avr. 2017 à 17:48, Andreas Mueller <t3kcit at gmail.com> a écrit :
> 
> Hey Andres.
> I think there might be a PR for that.
> Can you explain the minimum size of the training set? How is that used?
> I thought the other main option would be "rolling window" cross validation
> to use a fixed length cv training set.
> 
> So the two options to me were rolling window and what we're doing right now.
> Can you elaborate on the other use cases, like minimum size of the training set
> and why you would want the other options with a variable length training set?
> 
> Thanks,
> Andy
> 
> On 04/27/2017 09:44 AM, andres lago wrote:
>> Hello,
>>   I'd like to contribute with a new functionality in sklearn. It's the cross validation of time series. It's an evolution of the current functionality, implemented by TimeSeriesSplit.
>> 
>>   TimeSeriesSplit only allows the user to set the number of folds. In real life, when performing the cross validation of time series, other parameters are required, for instance:
>>     -minimum size of CV-training set
>>     -size of CV-test set
>>     -fixed or variable length of CV-training set.
>> 
>>   The functionality is inspired by the R library 'caret'.   
>> 
>>   If you agree, I can share my code. I developed it for a project with the french rail company SNCF. It's in production now.
>> 
>>   Regards,
>>     Andres 
>> 
>> 
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org <mailto:scikit-learn at python.org>
>> https://mail.python.org/mailman/listinfo/scikit-learn <https://mail.python.org/mailman/listinfo/scikit-learn>
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org <mailto:scikit-learn at python.org>
> https://mail.python.org/mailman/listinfo/scikit-learn <https://mail.python.org/mailman/listinfo/scikit-learn>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170428/b5809a81/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Sliding_window.jpg
Type: image/jpeg
Size: 32063 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170428/b5809a81/attachment-0001.jpg>


More information about the scikit-learn mailing list