<div dir="ltr">I am not sure if you are using "calibrated" in the correct sense.<div>Calibrated means that the predictions align with the real world probabilities.</div><div>so if you have a rare class it should have low probabilities</div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Nov 17, 2020 at 9:58 AM Sole Galli via scikit-learn <<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>Hello team,<br></div><div><br></div><div>I am trying to understand why does logistic regression return uncalibrated probabilities with values tending to low probabilities for the positive (rare) cases, when trained on an imbalanced dataset.<br></div><div><br></div><div>I've read a number of articles, all seem to agree that this is the case, many show empirical proof, but no mathematical demo. When I test it myself, I can see that this is indeed the case, Logit on imbalanced datasets returns uncalibrated probs.<br></div><div><br></div><div> And I understand that it has to do with the cost function, because if we re-balance the dataset with say class_weight = 'balance'. then the probabilities seem to be calibrated as a result.<br></div><div><br></div><div>I was wondering if any of you knows the mathematical demo that supports this conclusion? Any mathematical demo, or clear explanation of why logit would return uncalibrated probs when trained on an imbalanced dataset?<br></div><div><br></div><div>Any link to a relevant article, video, presentation, etc, will be greatly appreciated.<br></div><div><br></div><div>Thanks a lot!<br></div><div><br></div><div>Sole<br></div><div><div><br></div></div><div><br></div>_______________________________________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>
</blockquote></div>