Re: [scikit-learn] Decoding Differences Between SKL SVM and Matlab Libsvm Even When Parameters the Same
Many thanks for the responses thus far! *Did you fix the random seeds across implementations as well? Differencesin seeds or generators might explain this.* The implementation of libsvm used by Matlab always has a seed of 1. I tried setting the seed for SKL SVM to 1 (and 0, 2, 3, and 4) as well, and the results were still different. *Did you try using the Python API to libsvm directly instead of through SKL?I'm guessing you have it on your computer since you have the Matlab API.That would at least let you test whether it's the fake data or whether it'sSKL.* I'll give that a shot next, thanks! *Also are you loading the fake data from a .mat file into Python (e.g. withthe SciPy 'loadmat' function) or are you generating it from a script? Maybesome weird floating point error between Python and Matlab is giving you thedifferent results? This could happen if you generate the data with a scriptwritten in both Python and Matlab, for example... along the same lines asthe random seed generator giving different results* I'm generating the fake data with a Python script and saving it to a .txt file, which is then loaded in by Python and Matlab in their respective scripts. To make sure there's no truncation error going on when they load in this .txt file to get the fake data, I applied the floor function to both sets of vectors (to make them ints) in both the Python and Matlab scripts, and they still give different results. So I don't think it's a data issue.
Have you tried comparing the fit support vectors prior to comparing predicted values? You might need to set SaveSupportVectors in Matlab first. Thanks, Michael J. Bommarito II, CEO Bommarito Consulting, LLC *Web:* http://www.bommaritollc.com *Mobile:* +1 (646) 450-3387 On Wed, Jun 22, 2016 at 2:50 PM, Taylor, Johnmark < johnmarktaylor@g.harvard.edu> wrote:
Many thanks for the responses thus far!
*Did you fix the random seeds across implementations as well? Differencesin seeds or generators might explain this.*
The implementation of libsvm used by Matlab always has a seed of 1. I tried setting the seed for SKL SVM to 1 (and 0, 2, 3, and 4) as well, and the results were still different.
*Did you try using the Python API to libsvm directly instead of through SKL?I'm guessing you have it on your computer since you have the Matlab API.That would at least let you test whether it's the fake data or whether it'sSKL.*
I'll give that a shot next, thanks!
*Also are you loading the fake data from a .mat file into Python (e.g. withthe SciPy 'loadmat' function) or are you generating it from a script? Maybesome weird floating point error between Python and Matlab is giving you thedifferent results? This could happen if you generate the data with a scriptwritten in both Python and Matlab, for example... along the same lines asthe random seed generator giving different results*
I'm generating the fake data with a Python script and saving it to a .txt file, which is then loaded in by Python and Matlab in their respective scripts. To make sure there's no truncation error going on when they load in this .txt file to get the fake data, I applied the floor function to both sets of vectors (to make them ints) in both the Python and Matlab scripts, and they still give different results. So I don't think it's a data issue.
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Actually, I wonder if there is a difference between our implementation and Matlab's behavior. We seem to reset the seed to a hard-coded value when calling predict and predict_proba: In predict() and predict_proba() in here, we call set_predict_params(): https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/libsvm.... https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/libsvm.... However, set_predict_params() appears to reset the RNG to a hard-coded value of -1: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/libsvm.... Because you are requesting probability estimates, the state of the RNG will affect the resulting scores. If Matlab doesn't similarly reset the RNG prior to each predict call, then a difference would manifest here. I think if the underlying support vectors match but our predictions do not, this might explain it. Thanks, Michael J. Bommarito II, CEO Bommarito Consulting, LLC *Web:* http://www.bommaritollc.com *Mobile:* +1 (646) 450-3387 On Wed, Jun 22, 2016 at 3:07 PM, Michael Bommarito <michael@bommaritollc.com
wrote:
Have you tried comparing the fit support vectors prior to comparing predicted values? You might need to set SaveSupportVectors in Matlab first.
Thanks, Michael J. Bommarito II, CEO Bommarito Consulting, LLC *Web:* http://www.bommaritollc.com *Mobile:* +1 (646) 450-3387
On Wed, Jun 22, 2016 at 2:50 PM, Taylor, Johnmark < johnmarktaylor@g.harvard.edu> wrote:
Many thanks for the responses thus far!
*Did you fix the random seeds across implementations as well? Differencesin seeds or generators might explain this.*
The implementation of libsvm used by Matlab always has a seed of 1. I tried setting the seed for SKL SVM to 1 (and 0, 2, 3, and 4) as well, and the results were still different.
*Did you try using the Python API to libsvm directly instead of through SKL?I'm guessing you have it on your computer since you have the Matlab API.That would at least let you test whether it's the fake data or whether it'sSKL.*
I'll give that a shot next, thanks!
*Also are you loading the fake data from a .mat file into Python (e.g. withthe SciPy 'loadmat' function) or are you generating it from a script? Maybesome weird floating point error between Python and Matlab is giving you thedifferent results? This could happen if you generate the data with a scriptwritten in both Python and Matlab, for example... along the same lines asthe random seed generator giving different results*
I'm generating the fake data with a Python script and saving it to a .txt file, which is then loaded in by Python and Matlab in their respective scripts. To make sure there's no truncation error going on when they load in this .txt file to get the fake data, I applied the floor function to both sets of vectors (to make them ints) in both the Python and Matlab scripts, and they still give different results. So I don't think it's a data issue.
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
participants (2)
-
Michael Bommarito -
Taylor, Johnmark