[scikit-learn] Using logistic regression with count proportions data

Raphael C drraph at gmail.com
Mon Oct 10 07:15:17 EDT 2016


How do I use sample_weight for my use case?

In my case is "y" an array of 0s and 1s and sample_weight then an
array real numbers between 0 and 1 where I should make sure to set
sample_weight[i]= 0 when y[i] = 0?

Raphael

On 10 October 2016 at 12:08, Sean Violante <sean.violante at gmail.com> wrote:
> should be the sample weight function in fit
>
> http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
>
> On Mon, Oct 10, 2016 at 1:03 PM, Raphael C <drraph at gmail.com> wrote:
>>
>> I just noticed this about the glm package in R.
>> http://stats.stackexchange.com/a/26779/53128
>>
>> "
>> The glm function in R allows 3 ways to specify the formula for a
>> logistic regression model.
>>
>> The most common is that each row of the data frame represents a single
>> observation and the response variable is either 0 or 1 (or a factor
>> with 2 levels, or other varibale with only 2 unique values).
>>
>> Another option is to use a 2 column matrix as the response variable
>> with the first column being the counts of 'successes' and the second
>> column being the counts of 'failures'.
>>
>> You can also specify the response as a proportion between 0 and 1,
>> then specify another column as the 'weight' that gives the total
>> number that the proportion is from (so a response of 0.3 and a weight
>> of 10 is the same as 3 'successes' and 7 'failures')."
>>
>> Either of the last two options would do for me.  Does scikit-learn
>> support either of these last two options?
>>
>> Raphael
>>
>> On 10 October 2016 at 11:55, Raphael C <drraph at gmail.com> wrote:
>> > I am trying to perform regression where my dependent variable is
>> > constrained to be between 0 and 1. This constraint comes from the fact
>> > that it represents a count proportion. That is counts in some category
>> > divided by a total count.
>> >
>> > In the literature it seems that one common way to tackle this is to
>> > use logistic regression. However, it appears that in scikit learn
>> > logistic regression is only available as a classifier
>> >
>> > (http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
>> > ) . Is that right?
>> >
>> > Is there another way to perform regression using scikit learn where
>> > the dependent variable is a count proportion?
>> >
>> > Thanks for any help.
>> >
>> > Raphael
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


More information about the scikit-learn mailing list