<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
Thanks for the reply.  The covariates (“X") are all dummy/categorical variables.  So I guess no, nothing is normalized.
<div class="">
<div class=""><br class="">
<div>
<blockquote type="cite" class="">
<div class="">On Dec 15, 2016, at 1:54 PM, Alexey Dral <<a href="mailto:aadral@gmail.com" class="">aadral@gmail.com</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div dir="ltr" class="">Hi Rachel,
<div class=""><br class="">
</div>
<div class="">Do you have your data normalized?<br class="">
<div class="gmail_extra"><br class="">
<div class="gmail_quote">2016-12-15 20:21 GMT+03:00 Rachel Melamed <span dir="ltr" class="">
<<a href="mailto:melamed@uchicago.edu" target="_blank" class="">melamed@uchicago.edu</a>></span>:<br class="">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word" class="">
<div class="">Hi all,</div>
<div class="">Does anyone have any suggestions for this problem:</div>
<a href="http://stackoverflow.com/questions/41125342/sklearn-logistic-regression-gives-biased-results" target="_blank" class="">http://stackoverflow.com/<wbr class="">questions/41125342/sklearn-<wbr class="">logistic-regression-gives-<wbr class="">biased-results</a>
<div class=""><br class="">
</div>
<div class="">
<p style="margin:0px 0px 1em;padding:0px;border:0px;font-size:15px;clear:both;color:rgb(36,39,41);font-family:Arial,'Helvetica Neue',Helvetica,sans-serif;background-color:rgb(255,255,255)" class="">
I am running around 1000 similar logistic regressions, with the same covariates but slightly different data and response variables. All of my response variables have a sparse successes (p(success) < .05 usually).</p>
<p style="margin:0px 0px 1em;padding:0px;border:0px;font-size:15px;clear:both;color:rgb(36,39,41);font-family:Arial,'Helvetica Neue',Helvetica,sans-serif;background-color:rgb(255,255,255)" class="">
I noticed that with the regularized regression, the results are consistently biased to predict more "successes" than is observed in the training data. When I relax the regularization, this bias goes away. The bias observed is unacceptable for my use case, but
 the more-regularized model does seem a bit better.</p>
<p style="margin:0px 0px 1em;padding:0px;border:0px;font-size:15px;clear:both;color:rgb(36,39,41);font-family:Arial,'Helvetica Neue',Helvetica,sans-serif;background-color:rgb(255,255,255)" class="">
Below, I plot the results for the 1000 different regressions for 2 different values of C: <a href="https://i.stack.imgur.com/1cbrC.png" rel="nofollow noreferrer" style="margin:0px;padding:0px;border:0px;color:rgb(0,89,153);text-decoration:none" target="_blank" class=""><img src="https://i.stack.imgur.com/1cbrC.png" alt="results for the different regressions for 2 different values of C" style="margin:0px;padding:0px;border:0px;max-width:100%" class=""></a></p>
<p style="margin:0px 0px 1em;padding:0px;border:0px;font-size:15px;clear:both;color:rgb(36,39,41);font-family:Arial,'Helvetica Neue',Helvetica,sans-serif;background-color:rgb(255,255,255)" class="">
I looked at the parameter estimates for one of these regressions: below each point is one parameter. It seems like the intercept (the point on the bottom left) is too high for the C=1 model. <a href="https://i.stack.imgur.com/NTFOY.png" rel="nofollow noreferrer" style="margin:0px;padding:0px;border:0px;color:rgb(0,89,153);text-decoration:none" target="_blank" class=""><img src="https://i.stack.imgur.com/NTFOY.png" alt="enter image description here" style="margin:0px;padding:0px;border:0px;max-width:100%" class=""></a></p>
<p style="margin:0px 0px 1em;padding:0px;border:0px;font-size:15px;clear:both;color:rgb(36,39,41);font-family:Arial,'Helvetica Neue',Helvetica,sans-serif;background-color:rgb(255,255,255)" class="">
<br class="">
</p>
</div>
</div>
<br class="">
______________________________<wbr class="">_________________<br class="">
scikit-learn mailing list<br class="">
<a href="mailto:scikit-learn@python.org" class="">scikit-learn@python.org</a><br class="">
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank" class="">https://mail.python.org/<wbr class="">mailman/listinfo/scikit-learn</a><br class="">
<br class="">
</blockquote>
</div>
<br class="">
<br clear="all" class="">
<div class=""><br class="">
</div>
-- <br class="">
<div class="gmail_signature" data-smartmail="gmail_signature">
<div dir="ltr" class="">
<div class="">
<div dir="ltr" class="">
<div dir="ltr" class="">
<div class="">Yours sincerely,</div>
<div class=""><span style="font-size:12.8px" class="">Alexey A. Dral</span></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
_______________________________________________<br class="">
scikit-learn mailing list<br class="">
<a href="mailto:scikit-learn@python.org" class="">scikit-learn@python.org</a><br class="">
https://mail.python.org/mailman/listinfo/scikit-learn<br class="">
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
</body>
</html>