Decoding Differences Between SKL SVM and Matlab Libsvm Even When Parameters the Same
Hello, I am moving much of my neuroimaging coding over to Python from Matlab and so I am switching from using libsvm in Matlab to using Scikit-learn SVM in Python. Just to make sure I am not changing anything substantive about my analyses, I am experimenting with the two implementations and trying to see whether I can get them to yield identical results. In Python I am using: clf = svm.SVC(kernel='linear',C=1,probability=True) In Matlab (libsvm) I am using: clf = libsvmtrain(svm_training_labels,svm_training_vectors,['-t 0 -b 1 -c 1']) When I run the SVM using these two different ways using simulated data, I get subtly different results, even though I have fixed all of the parameters of the SVMs to be the same using input arguments (linear classifier, C=1, use probability estimates), and even though all the other default parameters seem to be the same across these functions (tolerance = .001, both using shrinking heuristics by default). To give more details regarding the simulations: One simulation I ran was designed to be absurdly difficult--it yielded 40% accuracy for Matlab libsvm, and 44% accuracy for scikit-learn svm (binary classification, chance = 50%). In this simulation, the two SVMs agreed in their predictions only 18% of the time (in other words, they were both not only guessing below chance, but they nearly always gave opposite guesses compared to each other). The other simulation was easier, yielding 68% accuracy for Matlab libsvm, and 67% accuracy for scikit-learn SVM. In this simulation, the two SVMs agreed in their predictions 97% of the time. So even though they often got it wrong, they tended to make the same wrong guesses. Any idea of what could possibly be leading to differences in the results? My understanding is that SKL uses libsvm under the hood, so it's a been confusing why the decoders are behaving differently. Both analyses are being run on the same computer (Linux OS). Thank you very much, JohnMark Taylor PhD Student, Harvard Vision Sciences Lab
Did you fix the random seeds across implementations as well? Differences in seeds or generators might explain this. Thanks, Michael J. Bommarito II, CEO Bommarito Consulting, LLC *Web:* http://www.bommaritollc.com *Mobile:* +1 (646) 450-3387 On Wed, Jun 22, 2016 at 1:15 PM, Taylor, Johnmark < johnmarktaylor@g.harvard.edu> wrote:
Hello,
I am moving much of my neuroimaging coding over to Python from Matlab and so I am switching from using libsvm in Matlab to using Scikit-learn SVM in Python. Just to make sure I am not changing anything substantive about my analyses, I am experimenting with the two implementations and trying to see whether I can get them to yield identical results.
In Python I am using:
clf = svm.SVC(kernel='linear',C=1,probability=True)
In Matlab (libsvm) I am using:
clf = libsvmtrain(svm_training_labels,svm_training_vectors,['-t 0 -b 1 -c 1'])
When I run the SVM using these two different ways using simulated data, I get subtly different results, even though I have fixed all of the parameters of the SVMs to be the same using input arguments (linear classifier, C=1, use probability estimates), and even though all the other default parameters seem to be the same across these functions (tolerance = .001, both using shrinking heuristics by default).
To give more details regarding the simulations:
One simulation I ran was designed to be absurdly difficult--it yielded 40% accuracy for Matlab libsvm, and 44% accuracy for scikit-learn svm (binary classification, chance = 50%). In this simulation, the two SVMs agreed in their predictions only 18% of the time (in other words, they were both not only guessing below chance, but they nearly always gave opposite guesses compared to each other).
The other simulation was easier, yielding 68% accuracy for Matlab libsvm, and 67% accuracy for scikit-learn SVM. In this simulation, the two SVMs agreed in their predictions 97% of the time. So even though they often got it wrong, they tended to make the same wrong guesses.
Any idea of what could possibly be leading to differences in the results? My understanding is that SKL uses libsvm under the hood, so it's a been confusing why the decoders are behaving differently. Both analyses are being run on the same computer (Linux OS).
Thank you very much,
JohnMark Taylor
PhD Student, Harvard Vision Sciences Lab
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Did you try using the Python API to libsvm directly instead of through SKL? I'm guessing you have it on your computer since you have the Matlab API. That would at least let you test whether it's the fake data or whether it's SKL. Also are you loading the fake data from a .mat file into Python (e.g. with the SciPy 'loadmat' function) or are you generating it from a script? Maybe some weird floating point error between Python and Matlab is giving you the different results? This could happen if you generate the data with a script written in both Python and Matlab, for example... along the same lines as the random seed generator giving different results On Jun 22, 2016 1:27 PM, "Michael Bommarito" <michael@bommaritollc.com> wrote:
Did you fix the random seeds across implementations as well? Differences in seeds or generators might explain this.
Thanks, Michael J. Bommarito II, CEO Bommarito Consulting, LLC *Web:* http://www.bommaritollc.com *Mobile:* +1 (646) 450-3387
On Wed, Jun 22, 2016 at 1:15 PM, Taylor, Johnmark < johnmarktaylor@g.harvard.edu> wrote:
Hello,
I am moving much of my neuroimaging coding over to Python from Matlab and so I am switching from using libsvm in Matlab to using Scikit-learn SVM in Python. Just to make sure I am not changing anything substantive about my analyses, I am experimenting with the two implementations and trying to see whether I can get them to yield identical results.
In Python I am using:
clf = svm.SVC(kernel='linear',C=1,probability=True)
In Matlab (libsvm) I am using:
clf = libsvmtrain(svm_training_labels,svm_training_vectors,['-t 0 -b 1 -c 1'])
When I run the SVM using these two different ways using simulated data, I get subtly different results, even though I have fixed all of the parameters of the SVMs to be the same using input arguments (linear classifier, C=1, use probability estimates), and even though all the other default parameters seem to be the same across these functions (tolerance = .001, both using shrinking heuristics by default).
To give more details regarding the simulations:
One simulation I ran was designed to be absurdly difficult--it yielded 40% accuracy for Matlab libsvm, and 44% accuracy for scikit-learn svm (binary classification, chance = 50%). In this simulation, the two SVMs agreed in their predictions only 18% of the time (in other words, they were both not only guessing below chance, but they nearly always gave opposite guesses compared to each other).
The other simulation was easier, yielding 68% accuracy for Matlab libsvm, and 67% accuracy for scikit-learn SVM. In this simulation, the two SVMs agreed in their predictions 97% of the time. So even though they often got it wrong, they tended to make the same wrong guesses.
Any idea of what could possibly be leading to differences in the results? My understanding is that SKL uses libsvm under the hood, so it's a been confusing why the decoders are behaving differently. Both analyses are being run on the same computer (Linux OS).
Thank you very much,
JohnMark Taylor
PhD Student, Harvard Vision Sciences Lab
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
participants (3)
-
David Nicholson -
Michael Bommarito -
Taylor, Johnmark