[scikit-learn] Using logistic regression with count proportions data

Sean Violante sean.violante at gmail.com
Mon Oct 10 07:08:28 EDT 2016


should be the sample weight function in fit

http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

On Mon, Oct 10, 2016 at 1:03 PM, Raphael C <drraph at gmail.com> wrote:

> I just noticed this about the glm package in R.
> http://stats.stackexchange.com/a/26779/53128
>
> "
> The glm function in R allows 3 ways to specify the formula for a
> logistic regression model.
>
> The most common is that each row of the data frame represents a single
> observation and the response variable is either 0 or 1 (or a factor
> with 2 levels, or other varibale with only 2 unique values).
>
> Another option is to use a 2 column matrix as the response variable
> with the first column being the counts of 'successes' and the second
> column being the counts of 'failures'.
>
> You can also specify the response as a proportion between 0 and 1,
> then specify another column as the 'weight' that gives the total
> number that the proportion is from (so a response of 0.3 and a weight
> of 10 is the same as 3 'successes' and 7 'failures')."
>
> Either of the last two options would do for me.  Does scikit-learn
> support either of these last two options?
>
> Raphael
>
> On 10 October 2016 at 11:55, Raphael C <drraph at gmail.com> wrote:
> > I am trying to perform regression where my dependent variable is
> > constrained to be between 0 and 1. This constraint comes from the fact
> > that it represents a count proportion. That is counts in some category
> > divided by a total count.
> >
> > In the literature it seems that one common way to tackle this is to
> > use logistic regression. However, it appears that in scikit learn
> > logistic regression is only available as a classifier
> > (http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.
> LogisticRegression.html
> > ) . Is that right?
> >
> > Is there another way to perform regression using scikit learn where
> > the dependent variable is a count proportion?
> >
> > Thanks for any help.
> >
> > Raphael
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161010/d2ac819d/attachment.html>


More information about the scikit-learn mailing list