[scikit-learn] suggested machine learning algorithm

Thomas Evangelidis tevang3 at gmail.com
Sun Oct 2 08:23:50 EDT 2016

On 1 October 2016 at 20:48, Алексей Драль <aadral at gmail.com> wrote:

> Hi Thomas,
> What quality do you have on training?
> There is no silver bullet, but there is quite common technique you can use
> to find out if you use appropriate algorithm. You can take a look at the
> difference between "train" and "validation" quality of learning curves (
> example
> <http://scikit-learn.org/stable/auto_examples/model_selection/plot_learning_curve.html#example-model-selection-plot-learning-curve-py>).
> If you see big gap, then you can reduce complexity of your model to
> overcome overfitting (reduce interaction parameter / number of variables
> / iterations / ...). If you see a small gap, then you can try to increase
> model complexity to fit your data better.
> ​​
> ​Hi ​Алексей,

the "Training examples" in the learning curves are  the number of
observations used for training? Don't you think my dataset is kind of small
(42 observations) to use that technique?

> Moreover, I see you have a tiny dataset and use 50/50 split. I presume,
> that you will train "production" model on the whole available dataset. In
> that case, I suggest you to use more data for training and use almost LOO
> <http://scikit-learn.org/stable/modules/cross_validation.html#leave-one-out-loo> approach
> to better estimate your predictive quality. But, be really cautious about
> cross-validation as you can easily overfit your data.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161002/17e61230/attachment.html>

More information about the scikit-learn mailing list