<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Oct 5, 2017 at 2:52 PM, Stuart Reynolds <span dir="ltr"><<a href="mailto:stuart@stuartreynolds.net" target="_blank">stuart@stuartreynolds.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Turns out sm.Logit does allow setting the tolerance.<br>

Some and quick and dirty time profiling of different methods on a 100k<br>

* 30 features dataset, with different solvers and losses:<br>

<br>

sklearn.LogisticRegression: l1 1.<a href="tel:13864398003" value="+13864398003">13864398003</a> (seconds)<br>

sklearn.LogisticRegression: l2 0.0538778305054<br>

sm.Logit l1 0.0922629833221  # Although didn't converge<br>

sm.Logit l1_cvxopt_cp 0.958268165588<br>

sm.Logit newton 0.133476018906<br>

sm.Logit nm 0.369864940643<br>

sm.Logit bfgs 0.105798006058<br>

sm.Logit lbfgs 0.06241106987<br>

sm.Logit powell 1.64219808578<br>

sm.Logit cg 0.2184278965<br>

sm.Logit ncg 0.216138124466<br>

sm.Logit basinhopping 8.82164621353<br>

sm.GLM.fit IRLS 0.544688940048<br>

sm.GLM L2: 1.29778695107<br>

<br>

I've been getting good results from sm.Logit.fit (although unregularized).<br>

statsmodels GLM seems a little slow. Not sure why.<br>

<br>

My benchmark may be a little apples-to-oranges, since the stopping<br>

criteria probably aren't comparable.<br></blockquote><div><br></div><div>I think that's a problem with GLM IRLS.</div><div>AFAIK, but never fully tested, is that the objective function is</div><div>proportional to the number of observations and the convergence</div><div>criterion becomes tighter as nobs increases.</div><div>I don't find the issue or PR discussion anymore, but one of our</div><div>contributors fixed maxiter at 15 or something like that for IRLS with</div><div>around 4 to 5 million observations and mostly categorical explanatory</div><div>variables in his application.</div><div><br></div><div>unfortunately (no upfront design and decisions across models)</div><div><a href="https://github.com/statsmodels/statsmodels/issues/2825">https://github.com/statsmodels/statsmodels/issues/2825</a></div><div><br></div><div>Josef</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

<br>

For tiny models, which I'm also building: 100 samples, 5 features<br>

<br>

sklearn.LogisticRegression: l1 0.00137376785278<br>

sklearn.LogisticRegression: l2 0.00167894363403<br>

sm.Logit l1 0.0198900699615<br>

sm.Logit l1_cvxopt_cp 0.162448167801<br>

sm.Logit newton 0.00689911842346<br>

sm.Logit nm 0.0754928588867<br>

sm.Logit bfgs 0.0210938453674<br>

sm.Logit lbfgs 0.0156588554382<br>

sm.Logit powell 0.0161390304565<br>

sm.Logit cg 0.00759506225586<br>

sm.Logit ncg 0.00541186332703<br>

sm.Logit basinhopping 0.3076171875<br>

sm.GLM.fit IRLS 0.00902199745178<br>

sm.GLM L2: 0.0208361148834<br>

<br>

I couldn't get sm.GLM.fit to work with non "IRLS" solvers. (hits a<br>

division by zero).<br>

<br>

----<br>

<br>

import sklearn.datasets<br>

from sklearn.preprocessing import StandardScaler<br>

X, y = sklearn.datasets.make_<wbr>classification(n_samples=<wbr>10000,<br>

n_features=30, random_state=123)<br>

X = StandardScaler(copy=True, with_mean=True, with_std=True).fit_transform(<wbr>X)<br>

<br>

import time<br>

tol = 0.0001<br>

maxiter = 100<br>

DISP = 0<br>

<br>

<br>

if 1: # sk.LogisticRegression<br>

    import sklearn<br>

    from sklearn.linear_model import LogisticRegression<br>

<br>

    for method in ["l1", "l2"]: # TODO, add solvers:<br>

        t = time.time()<br>

        model = LogisticRegression(C=1, tol=tol, max_iter=maxiter,<br>

penalty=method)<br>

        model.fit(X,y)<br>

        print "sklearn.LogisticRegression:", method, time.time() - t<br>

<br>

<br>

<br>

<br>

if 1: # sm.Logit.fit_regularized<br>

    from statsmodels.discrete.discrete_<wbr>model import Logit<br>

    for method in ["l1", "l1_cvxopt_cp"]:<br>

        t = time.time()<br>

        model = Logit(y,X)<br>

        result = model.fit_regularized(method=<wbr>method, maxiter=maxiter,<br>

                                       alpha=1.,<br>

                                       abstol=tol,<br>

                                       acc=tol,<br>

                                       tol=tol, gtol=tol, pgtol=tol,<br>

                                       disp=DISP)<br>

        print "sm.Logit", method, time.time() - t<br>

<br>

if 1: # sm.Logit.fit<br>

    from statsmodels.discrete.discrete_<wbr>model import Logit<br>

<br>

    SOLVERS = ["newton", "nm",<br>

"bfgs","lbfgs","powell","cg","<wbr>ncg","basinhopping",]<br>

    for method in SOLVERS:<br>

        t = time.time()<br>

        model = Logit(y,X)<br>

        result = model.fit(method=method, maxiter=maxiter,<br>

                           niter=maxiter,<br>

                           ftol=tol,<br>

                           tol=tol, gtol=tol, pgtol=tol,  # Hmmm..<br>

needs to be reviewed.<br>

                           disp=DISP)<br>

        print "sm.Logit", method, time.time() - t<br>

<br>

if 1: # sm.GLM.fit<br>

<span class="gmail-">    from statsmodels.genmod.<wbr>generalized_linear_model import GLM<br>

    from statsmodels.genmod.<wbr>generalized_linear_model import families<br>

</span>    for method in ["IRLS"]:<br>

        t = time.time()<br>

        model = GLM(y, X, family=families.Binomial(link=<wbr>families.links.logit))<br>

        result = model.fit(method=method, cnvrg_tol=tol,<br>

maxiter=maxiter, full_output=False, disp=DISP)<br>

        print "sm.GLM.fit", method, time.time() - t<br>

<br>

<br>

if 1: # GLM.fit_regularized<br>

<span class="gmail-">    from statsmodels.genmod.<wbr>generalized_linear_model import GLM<br>

    from statsmodels.genmod.<wbr>generalized_linear_model import families<br>

</span>    t = time.time()<br>

    model = GLM(y, X, family=families.Binomial(link=<wbr>families.links.logit))<br>

    result = model.fit_regularized(method='<wbr>elastic_net', alpha=1.0,<br>

L1_wt=0.0, cnvrg_tol=tol, maxiter=maxiter)<br>

    print "sm.GLM L2:", time.time() - t<br>

<br>

<br>

<br>

if 0: # GLM.fit<br>

    # Hits division by zero.<br>

    SOLVERS = ["bfgs","lbfgs", "netwon", "nm",<br>

"powell","cg","ncg","<wbr>basinhopping",]<br>

<span class="gmail-">    from statsmodels.genmod.<wbr>generalized_linear_model import GLM<br>

    from statsmodels.genmod.<wbr>generalized_linear_model import families<br>

</span>    for method in SOLVERS:<br>

        t = time.time()<br>

        model = GLM(y, X, family=families.Binomial(link=<wbr>families.links.logit))<br>

        result = model.fit(method=method,<br>

#                           scale="X2",<br>

#                            alpha=1.,<br>

#                            abstol=tol,<br>

#                            acc=tol,<br>

#                            tol=tol, gtol=tol, pgtol=tol,<br>

#                            maxiter=maxiter,<br>

#                            #full_output=False,<br>

                           disp=DISP)<br>

        print "sm.GLM.fit", method, time.time() - t<br>

<div class="gmail-HOEnZb"><div class="gmail-h5"><br>

<br>

On Thu, Oct 5, 2017 at 10:32 AM, Sean Violante <<a href="mailto:sean.violante@gmail.com">sean.violante@gmail.com</a>> wrote:<br>

> Stuart<br>

> have you tried glmnet ( in R) there is a python version<br>

> <a href="https://web.stanford.edu/~hastie/glmnet_python/" rel="noreferrer" target="_blank">https://web.stanford.edu/~<wbr>hastie/glmnet_python/</a> ....<br>

><br>

><br>

><br>

><br>

> On Thu, Oct 5, 2017 at 6:34 PM, Stuart Reynolds <<a href="mailto:stuart@stuartreynolds.net">stuart@stuartreynolds.net</a>><br>

> wrote:<br>

>><br>

>> Thanks Josef. Was very useful.<br>

>><br>

>> result.remove_data() reduces a 5 parameter Logit result object from<br>

>> megabytes to 5Kb (as compared to a minimum uncompressed size of the<br>

>> parameters of ~320 bytes). Is big improvement. I'll experiment with<br>

>> what you suggest -- since this is still >10x larger than possible. I<br>

>> think the difference is mostly attribute names.<br>

>> I don't mind the lack of a multinomial support. I've often had better<br>

>> results mixing independent models for each class.<br>

>><br>

>> I'll experiment with the different solvers.  I tried the Logit model<br>

>> in the past -- its fit function only exposed a maxiter, and not a<br>

>> tolerance -- meaning I had to set maxiter very high. The newer<br>

>> statsmodels GLM module looks great and seem to solve this.<br>

>><br>

>> For other who come this way, I think the magic for ridge regression is:<br>

>><br>

>>         from statsmodels.genmod.<wbr>generalized_linear_model import GLM<br>

>>         from statsmodels.genmod.<wbr>generalized_linear_model import families<br>

>>         from statsmodels.genmod.<wbr>generalized_linear_model.<wbr>families import<br>

>> links<br>

>><br>

>>         model = GLM(y, Xtrain, family=families.Binomial(link=<wbr>links.Logit))<br>

>>         result = model.fit_regularized(method='<wbr>elastic_net',<br>

>> alpha=l2weight, L1_wt=0.0, tol=...)<br>

>>         result.remove_data()<br>

>>         result.predict(Xtest)<br>

>><br>

>> One last thing -- its clear that it should be possible to do something<br>

>> like scikit's LogisticRegressionCV in order to quickly optimize a<br>

>> single parameter by re-using past coefficients.<br>

>> Are there any wrappers in statsmodels for doing this or should I roll my<br>

>> own?<br>

>><br>

>><br>

>> - Stu<br>

>><br>

>><br>

>> On Wed, Oct 4, 2017 at 3:43 PM,  <<a href="mailto:josef.pktd@gmail.com">josef.pktd@gmail.com</a>> wrote:<br>

>> ><br>

>> ><br>

>> > On Wed, Oct 4, 2017 at 4:26 PM, Stuart Reynolds<br>

>> > <<a href="mailto:stuart@stuartreynolds.net">stuart@stuartreynolds.net</a>><br>

>> > wrote:<br>

>> >><br>

>> >> Hi Andy,<br>

>> >> Thanks -- I'll give another statsmodels another go.<br>

>> >> I remember I had some fitting speed issues with it in the past, and<br>

>> >> also some issues related their models keeping references to the data<br>

>> >> (=disaster for serialization and multiprocessing) -- although that was<br>

>> >> a long time ago.<br>

>> ><br>

>> ><br>

>> > The second has not changed and will not change, but there is a<br>

>> > remove_data<br>

>> > method that deletes all references to full, data sized arrays. However,<br>

>> > once<br>

>> > the data is removed, it is not possible anymore to compute any new<br>

>> > results<br>

>> > statistics which are almost all lazily computed.<br>

>> > The fitting speed depends a lot on the optimizer, convergence criteria<br>

>> > and<br>

>> > difficulty of the problem, and availability of good starting parameters.<br>

>> > Almost all nonlinear estimation problems use the scipy optimizers, all<br>

>> > unconstrained optimizers can be used. There are no optimized special<br>

>> > methods<br>

>> > for cases with a very large number of features.<br>

>> ><br>

>> > Multinomial/multiclass models don't support continuous response (yet),<br>

>> > all<br>

>> > other GLM and discrete models allow for continuous data in the interval<br>

>> > extension of the domain.<br>

>> ><br>

>> > Josef<br>

>> ><br>

>> ><br>

>> >><br>

>> >> - Stuart<br>

>> >><br>

>> >> On Wed, Oct 4, 2017 at 1:09 PM, Andreas Mueller <<a href="mailto:t3kcit@gmail.com">t3kcit@gmail.com</a>><br>

>> >> wrote:<br>

>> >> > Hi Stuart.<br>

>> >> > There is no interface to do this in scikit-learn (and maybe we should<br>

>> >> > at<br>

>> >> > this to the FAQ).<br>

>> >> > Yes, in principle this would be possible with several of the models.<br>

>> >> ><br>

>> >> > I think statsmodels can do that, and I think I saw another glm<br>

>> >> > package<br>

>> >> > for Python that does that?<br>

>> >> ><br>

>> >> > It's certainly a legitimate use-case but would require substantial<br>

>> >> > changes to the code. I think so far we decided not to support<br>

>> >> > this in scikit-learn. Basically we don't have a concept of a link<br>

>> >> > function, and it's a concept that only applies to a subset of models.<br>

>> >> > We try to have a consistent interface for all our estimators, and<br>

>> >> > this doesn't really fit well within that interface.<br>

>> >> ><br>

>> >> > Hth,<br>

>> >> > Andy<br>

>> >> ><br>

>> >> ><br>

>> >> > On 10/04/2017 03:58 PM, Stuart Reynolds wrote:<br>

>> >> >><br>

>> >> >> I'd like to fit a model that maps a matrix of continuous inputs to a<br>

>> >> >> target that's between 0 and 1 (a probability).<br>

>> >> >><br>

>> >> >> In principle, I'd expect logistic regression should work out of the<br>

>> >> >> box with no modification (although its often posed as being strictly<br>

>> >> >> for classification, its loss function allows for fitting targets in<br>

>> >> >> the range 0 to 1, and not strictly zero or one.)<br>

>> >> >><br>

>> >> >> However, scikit's LogisticRegression and LogisticRegressionCV reject<br>

>> >> >> target arrays that are continuous. Other LR implementations allow a<br>

>> >> >> matrix of probability estimates. Looking at:<br>

>> >> >><br>

>> >> >><br>

>> >> >><br>

>> >> >> <a href="http://scikit-learn-general.narkive.com/4dSCktaM/using-logistic-regression-on-a-continuous-target-variable" rel="noreferrer" target="_blank">http://scikit-learn-general.<wbr>narkive.com/4dSCktaM/using-<wbr>logistic-regression-on-a-<wbr>continuous-target-variable</a><br>

>> >> >> and the fix here:<br>

>> >> >> <a href="https://github.com/scikit-learn/scikit-learn/pull/5084" rel="noreferrer" target="_blank">https://github.com/scikit-<wbr>learn/scikit-learn/pull/5084</a>, which<br>

>> >> >> disables<br>

>> >> >> continuous inputs, it looks like there was some reason for this. So<br>

>> >> >> ... I'm looking for alternatives.<br>

>> >> >><br>

>> >> >> SGDClassifier allows log loss and (if I understood the docs<br>

>> >> >> correctly)<br>

>> >> >> adds a logistic link function, but also rejects continuous targets.<br>

>> >> >> Oddly, SGDRegressor only allows  ‘squared_loss’, ‘huber’,<br>

>> >> >> ‘epsilon_insensitive’, or ‘squared_epsilon_insensitive’, and doesn't<br>

>> >> >> seems to give a logistic function.<br>

>> >> >><br>

>> >> >> In principle, GLM allow this, but scikit's docs say the GLM models<br>

>> >> >> only allows strict linear functions of their input, and doesn't<br>

>> >> >> allow<br>

>> >> >> a logistic link function. The docs direct people to the<br>

>> >> >> LogisticRegression class for this case.<br>

>> >> >><br>

>> >> >> In R, there is:<br>

>> >> >><br>

>> >> >> glm(Total_Service_Points_Won/<wbr>Total_Service_Points_Played ~ ... ,<br>

>> >> >>      family = binomial(link=logit), weights =<br>

>> >> >> Total_Service_Points_Played)<br>

>> >> >> which would be ideal.<br>

>> >> >><br>

>> >> >> Is something similar available in scikit? (Or any continuous model<br>

>> >> >> that takes and 0 to 1 target and outputs a 0 to 1 target?)<br>

>> >> >><br>

>> >> >> I was surprised to see that the implementation of<br>

>> >> >> CalibratedClassifierCV(method=<wbr>"sigmoid") uses an internal<br>

>> >> >> implementation of logistic regression to do its logistic regressing<br>

>> >> >> --<br>

>> >> >> which I can use, although I'd prefer to use a user-facing library.<br>

>> >> >><br>

>> >> >> Thanks,<br>

>> >> >> - Stuart<br>

>> >> >> ______________________________<wbr>_________________<br>

>> >> >> scikit-learn mailing list<br>

>> >> >> <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>

>> >> >> <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>

>> >> ><br>

>> >> ><br>

>> >> > ______________________________<wbr>_________________<br>

>> >> > scikit-learn mailing list<br>

>> >> > <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>

>> >> > <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>

>> >> ______________________________<wbr>_________________<br>

>> >> scikit-learn mailing list<br>

>> >> <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>

>> >> <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>

>> ><br>

>> ><br>

>> ><br>

>> > ______________________________<wbr>_________________<br>

>> > scikit-learn mailing list<br>

>> > <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>

>> > <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>

>> ><br>

>> ______________________________<wbr>_________________<br>

>> scikit-learn mailing list<br>

>> <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>

>> <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>

><br>

><br>

><br>

> ______________________________<wbr>_________________<br>

> scikit-learn mailing list<br>

> <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>

> <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>

><br>

______________________________<wbr>_________________<br>

scikit-learn mailing list<br>

<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>

</div></div></blockquote></div><br></div></div>