[scikit-learn] Support Vector Machines: Sensitive to Single Datapoints?

Jacob Vanderplas jakevdp at cs.washington.edu
Tue Dec 19 16:37:35 EST 2017


Hi JohnMark,
SVMs, by design, are quite sensitive to the addition of single data points
– but only if those data points happen to lie near the margin. I wrote
about some of those types of details here:
https://jakevdp.github.io/PythonDataScienceHandbook/05.07-support-vector-machines.html


Hope that helps,
   Jake

 Jake VanderPlas
 Senior Data Science Fellow
 Director of Open Software
 University of Washington eScience Institute

On Tue, Dec 19, 2017 at 1:27 PM, Taylor, Johnmark <
johnmarktaylor at g.harvard.edu> wrote:

> Hello,
>
> I am a researcher in fMRI and am using SVMs to analyze brain data. I am
> doing decoding between two classes, each of which has 24 exemplars per
> class. I am comparing two different methods of cross-validation for my
> data: in one, I am training on 23 exemplars from each class, and testing on
> the remaining example from each class, and in the other, I am training on
> 22 exemplars from each class, and testing on the remaining two from each
> class (in case it matters, the data is structured into different
> neuroimaging "runs", with each "run" containing several "blocks"; the first
> cross-validation method is leaving out one block at a time, the second is
> leaving out one run at a time).
>
> Now, I would've thought that these two CV methods would be very similar,
> since the vast majority of the training data is the same; the only
> difference is in adding two additional points. However, they are yielding
> very different results: training on 23 per class is yielding 60% decoding
> accuracy (averaged across several subjects, and statistically significantly
> greater than chance), training on 22 per class is yielding chance (50%)
> decoding. Leaving aside the particulars of fMRI in this case: is it unusual
> for single points (amounting to less than 5% of the data) to have such a
> big influence on SVM decoding? I am using a cost parameter of C=1. I must
> say it is counterintuitive to me that just a couple points out of two dozen
> could make such a big difference.
>
> Thank you very much, and cheers,
>
> JohnMark
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20171219/42c864f1/attachment.html>


More information about the scikit-learn mailing list