[scikit-learn] Contribution to sklearn: Cross validation of time series

Andreas Mueller t3kcit at gmail.com
Fri Apr 28 11:48:26 EDT 2017


Hey Andres.
I think there might be a PR for that.
Can you explain the minimum size of the training set? How is that used?
I thought the other main option would be "rolling window" cross validation
to use a fixed length cv training set.

So the two options to me were rolling window and what we're doing right now.
Can you elaborate on the other use cases, like minimum size of the 
training set
and why you would want the other options with a variable length training 
set?

Thanks,
Andy

On 04/27/2017 09:44 AM, andres lago wrote:
>
> Hello,
>
>   I'd like to contribute with a new functionality in sklearn. It's the 
> cross validation of time series. It's an evolution of the 
> current functionality, implemented by TimeSeriesSplit.
>
>
> TimeSeriesSplit only allows the user to set the number of folds. In 
> real life, when performing the cross validation of time series, other 
> parameters are required, for instance:
>
>     -minimum size of CV-training set
>
>     -size of CV-test set
>
>     -fixed or variable length of CV-training set.
>
>
>   The functionality is inspired by the R library 'caret'.
>
>
>   If you agree, I can share my code. I developed it for a project with 
> the french rail company SNCF. It's in production now.
>
>
>   Regards,
>
>     Andres
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170428/5972e708/attachment.html>


More information about the scikit-learn mailing list