[scikit-learn] Permutation-test-score

Olivier Grisel olivier.grisel at ensta.org
Sun Feb 5 04:44:01 EST 2017


This is non-parametric (aka brute force) way to check that a model has a
predictive performance significantly higher than chance. For models with
90% accuracy this is useless as we already know for sure that the model is
better than predicting at random. This method is only useful if you have
very little data or very noisy data and you are not even sure that your
predictive method is able to pick anything predictive from the data. E.g.
you have a balanced binary classification problem with ~52% accuracy.

It proceeds as follows: it first does a single cross-validation round with
the true label to compute a reference score. Then it does the same 100
times but each time with independently randomly permuted variants of the
labels (the y array). Then it returns the fraction of the time the
reference CV score was higher than the CV scores of the models trained and
evaluated with permuted labels.

Here is an example:

http://scikit-learn.org/stable/auto_examples/feature_selection/plot_permutation_test_for_classification.html

Note that you should not use than method to select the best model from a
collection of possible models and then report its permutation test p-value
without correcting for multiple comparisons.

-- 
Olivier
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170205/df537974/attachment.html>


More information about the scikit-learn mailing list