[scikit-learn] Decoding Differences Between SKL SVM and Matlab Libsvm Even When Parameters the Same

Michael Bommarito michael at bommaritollc.com
Wed Jun 22 15:16:30 EDT 2016


  Actually, I wonder if there is a difference between our implementation
and Matlab's behavior.  We seem to reset the seed to a hard-coded value
when calling predict and predict_proba:

  In predict() and predict_proba() in here, we call set_predict_params():
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/libsvm.pyx#L315
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/libsvm.pyx#L381

  However, set_predict_params() appears to reset the RNG to a hard-coded
value of -1:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/libsvm.pyx#L261

  Because you are requesting probability estimates, the state of the RNG
will affect the resulting scores.  If Matlab doesn't similarly reset the
RNG prior to each predict call, then a difference would manifest here.

  I think if the underlying support vectors match but our predictions do
not, this might explain it.

Thanks,
Michael J. Bommarito II, CEO
Bommarito Consulting, LLC
*Web:* http://www.bommaritollc.com
*Mobile:* +1 (646) 450-3387

On Wed, Jun 22, 2016 at 3:07 PM, Michael Bommarito <michael at bommaritollc.com
> wrote:

>   Have you tried comparing the fit support vectors prior to comparing
> predicted values?  You might need to set SaveSupportVectors in Matlab first.
>
> Thanks,
> Michael J. Bommarito II, CEO
> Bommarito Consulting, LLC
> *Web:* http://www.bommaritollc.com
> *Mobile:* +1 (646) 450-3387
>
> On Wed, Jun 22, 2016 at 2:50 PM, Taylor, Johnmark <
> johnmarktaylor at g.harvard.edu> wrote:
>
>> Many thanks for the responses thus far!
>>
>>
>> *Did you fix the random seeds across implementations as well?
>> Differencesin seeds or generators might explain this.*
>>
>> The implementation of libsvm used by Matlab always has a seed of 1. I
>> tried setting the seed for SKL SVM to 1 (and 0, 2, 3, and 4) as well, and
>> the results were still different.
>>
>>
>>
>>
>> *Did you try using the Python API to libsvm directly instead of through
>> SKL?I'm guessing you have it on your computer since you have the Matlab
>> API.That would at least let you test whether it's the fake data or whether
>> it'sSKL.*
>>
>> I'll give that a shot next, thanks!
>>
>>
>>
>>
>>
>>
>> *Also are you loading the fake data from a .mat file into Python (e.g.
>> withthe SciPy 'loadmat' function) or are you generating it from a script?
>> Maybesome weird floating point error between Python and Matlab is giving
>> you thedifferent results? This could happen if you generate the data with a
>> scriptwritten in both Python and Matlab, for example... along the same
>> lines asthe random seed generator giving different results*
>>
>> I'm generating the fake data with a Python script and saving it to a .txt
>> file, which is then loaded in by Python and Matlab in their respective
>> scripts. To make sure there's no truncation error going on when they load
>> in this .txt file to get the fake data, I applied the floor function to
>> both sets of vectors (to make them ints) in both the Python and Matlab
>> scripts, and they still give different results. So I don't think it's a
>> data issue.
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160622/c5f37703/attachment-0001.html>


More information about the scikit-learn mailing list