<div dir="ltr"><div> Actually, I wonder if there is a difference between our implementation and Matlab's behavior. We seem to reset the seed to a hard-coded value when calling predict and predict_proba:</div><div><br></div><div> In predict() and predict_proba() in here, we call set_predict_params():</div><div><a href="https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/libsvm.pyx#L315">https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/libsvm.pyx#L315</a></div><div><a href="https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/libsvm.pyx#L381">https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/libsvm.pyx#L381</a><br></div><div><br></div><div> However, set_predict_params() appears to reset the RNG to a hard-coded value of -1:</div><div><a href="https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/libsvm.pyx#L261">https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/libsvm.pyx#L261</a></div><div><br></div><div> Because you are requesting probability estimates, the state of the RNG will affect the resulting scores. If Matlab doesn't similarly reset the RNG prior to each predict call, then a difference would manifest here.</div><div><br></div><div> I think if the underlying support vectors match but our predictions do not, this might explain it.</div></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><font color="#444444">Thanks,</font><div><div><font color="#444444">Michael J. Bommarito II, CEO</font></div><div><font color="#444444">Bommarito Consulting, LLC</font></div><div><font color="#444444"><b>Web:</b> </font><a href="http://www.bommaritollc.com" target="_blank"><font color="#3d85c6">http://www.bommaritollc.com</font></a></div><div><b><font color="#444444">Mobile:</font></b> <a><font color="#3d85c6">+1 <span title="Call with Google Voice">(646) 450-3387</span></font></a></div></div></div></div></div>
<br><div class="gmail_quote">On Wed, Jun 22, 2016 at 3:07 PM, Michael Bommarito <span dir="ltr"><<a href="mailto:michael@bommaritollc.com" target="_blank">michael@bommaritollc.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"> Have you tried comparing the fit support vectors prior to comparing predicted values? You might need to set SaveSupportVectors in Matlab first.</div><div class="gmail_extra"><span class=""><br clear="all"><div><div data-smartmail="gmail_signature"><div dir="ltr"><font color="#444444">Thanks,</font><div><div><font color="#444444">Michael J. Bommarito II, CEO</font></div><div><font color="#444444">Bommarito Consulting, LLC</font></div><div><font color="#444444"><b>Web:</b> </font><a href="http://www.bommaritollc.com" target="_blank"><font color="#3d85c6">http://www.bommaritollc.com</font></a></div><div><b><font color="#444444">Mobile:</font></b> <a><font color="#3d85c6">+1 <span title="Call with Google Voice">(646) 450-3387</span></font></a></div></div></div></div></div>
<br></span><div class="gmail_quote"><div><div class="h5">On Wed, Jun 22, 2016 at 2:50 PM, Taylor, Johnmark <span dir="ltr"><<a href="mailto:johnmarktaylor@g.harvard.edu" target="_blank">johnmarktaylor@g.harvard.edu</a>></span> wrote:<br></div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5"><div dir="ltr">Many thanks for the responses thus far!<span><div><br></div><div><i><span style="font-size:13px">Did you fix the random seeds across implementations as well? Differences</span><br style="font-size:13px"><span style="font-size:13px">in seeds or generators might explain this.</span></i><br></div><div><i><span style="font-size:13px"><br></span></i></div></span><div>The implementation of libsvm used by Matlab always has a seed of 1. I tried setting the seed for SKL SVM to 1 (and 0, 2, 3, and 4) as well, and the results were still different.</div><span><div><br></div><div><i><span style="font-size:13px">Did you try using the Python API to libsvm directly instead of through SKL?</span><br style="font-size:13px"><span style="font-size:13px">I'm guessing you have it on your computer since you have the Matlab API.</span><br style="font-size:13px"><span style="font-size:13px">That would at least let you test whether it's the fake data or whether it's</span><br style="font-size:13px"><span style="font-size:13px">SKL.</span></i></div><div><i><span style="font-size:13px"><br></span></i></div></span><div>I'll give that a shot next, thanks!</div><span><div><i><br style="font-size:13px"><span style="font-size:13px">Also are you loading the fake data from a .mat file into Python (e.g. with</span><br style="font-size:13px"><span style="font-size:13px">the SciPy 'loadmat' function) or are you generating it from a script? Maybe</span><br style="font-size:13px"><span style="font-size:13px">some weird floating point error between Python and Matlab is giving you the</span><br style="font-size:13px"><span style="font-size:13px">different results? This could happen if you generate the data with a script</span><br style="font-size:13px"><span style="font-size:13px">written in both Python and Matlab, for example... along the same lines as</span><br style="font-size:13px"><span style="font-size:13px">the random seed generator giving different results</span></i><br></div><div><i><span style="font-size:13px"><br></span></i></div></span><div>I'm generating the fake data with a Python script and saving it to a .txt file, which is then loaded in by Python and Matlab in their respective scripts. To make sure there's no truncation error going on when they load in this .txt file to get the fake data, I applied the floor function to both sets of vectors (to make them ints) in both the Python and Matlab scripts, and they still give different results. So I don't think it's a data issue. </div></div>
<br></div></div><span class="">_______________________________________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>
<br></span></blockquote></div><br></div>
</blockquote></div><br></div>