<div dir="ltr"><div dir="ltr"><div><b>The set of independent regressions described in Wikipedia is *not* an OvR model.</b> It is just a (weird) way to understand the multinomial logistic regression model.</div><div>OvR logistic regression and multinomial logistic regression are two different models.</div><div><br></div><div>In multinomial logistic regression as a set of independent binary regressions as described in Wikipedia, you have K - 1 binary regressions between class k (k from 1 to K - 1) and class K.</div><div>Whereas in OvR logistic regression you have K binary regressions between class k (k from 1 to K) and class "not class k".</div><div>The normalization is therefore different.</div><div><br></div><div>Indeed, in multinomial logistic regression as a set of independent binary regressions, you have (from the beginning) the property 1 = sum_k p(y = k). The normalization 1 / (1 + sum_{k=1}^{K - 1} p(y = k)) comes from the late computation of p(y = K) using this property.</div><div>Whereas in OvR logistic regression, you only have 1 = p_k(y = k) + p_k(y != k). Therefore the probabilities p_k(y = k) do not sum to one, and you need to normalize them with sum_{k=1}^{K} p_k(y = k) to create a valid probability of the OvR model. This is done in the same way in OneVsRestClassifier (<a href="https://github.com/scikit-learn/scikit-learn/blob/1a850eb5b601f3bf0f88a43090f83c51b3d8c593/sklearn/multiclass.py#L350-L351">https://github.com/scikit-learn/scikit-learn/blob/1a850eb5b601f3bf0f88a43090f83c51b3d8c593/sklearn/multiclass.py#L350-L351</a>).</div><div><br></div><div>But I agree that this description of the multinomial model is quite confusing, compared to the log-linear/softmax description.</div><div><br></div><div>Tom</div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Le jeu. 7 févr. 2019 à 08:31, Guillaume Lemaître <<a href="mailto:g.lemaitre58@gmail.com">g.lemaitre58@gmail.com</a>> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div>I was earlier looking at the code of predict_proba of LDA and LogisticRegression. While we certainly some bugs I was a bit confused and I thought an email would be better than opening an issue since that might not be one.</div><div><br></div><div>In the case of multiclass classification, the probabilities could be computed with two different assumptions - either as a set of independent binary regression or as a log-linear model (<a href="https://en.wikipedia.org/wiki/Multinomial_logistic_regression" target="_blank">https://en.wikipedia.org/wiki/Multinomial_logistic_regression</a>).</div><div><br></div><div>Then, we can compute the probabilities either by using a class as a pivot and computing exp(beta_c X) / 1 + sum(exp(beta_k X)) or using all classes and computing a softmax.</div><div><br></div><div>My question is related to the LogisticRegression in the OvR scheme. Naively, I thought that it was corresponding to the former case (case of a set of independent regression). However, we are using another normalization there which was first implemented in liblinear. I search on liblinear's issue tracker and found: <a href="https://github.com/cjlin1/liblinear/pull/20" target="_blank">https://github.com/cjlin1/liblinear/pull/20</a></div><div><br></div><div>It is related to the following paper: <a href="https://www.csie.ntu.edu.tw/~cjlin/papers/generalBT.pdf" target="_blank">https://www.csie.ntu.edu.tw/~cjlin/papers/generalBT.pdf</a></div><div><br></div><div>My skill in math is limited and I am not sure to grasp what is going on? Anybody could bring some lights on this OvR normalization and why is it different from the case of a set of independent regression describe in Wikipedia?<br></div><div><br></div><div>Cheers,<br></div><div><div>-- <br><div dir="ltr" class="gmail-m_2015834652004284878gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>Guillaume Lemaitre<br>INRIA Saclay - Parietal team<br>Center for Data Science Paris-Saclay<br><a href="https://glemaitre.github.io/" target="_blank">https://glemaitre.github.io/</a></div></div></div></div></div></div></div></div></div></div></div></div></div>

_______________________________________________<br>

scikit-learn mailing list<br>

<a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>

</blockquote></div>