[scikit-learn] Query regarding parameter class_weight in Random Forest Classifier

Debabrata Ghosh mailfordebu at gmail.com
Sun Jan 22 08:00:31 EST 2017


Thanks Josh !

I have used the parameter class_weight={0: 1, 1: 10} and the model code has
run successfully. However, just to get a further clarity around it's
concept, I am having another question for you please. I did the following 2
tests:

1. In my dataset , I have 1 million negative classes and 10,000 positive
classes. First I ran my model code without supplying any class_weight
parameter and it gave me certain True Positive and False Positive results.

2. Now in the second test, I had the same 1 million negative classes but
reduced the positive classes to 1000 . But this time, I supplied the
parameter class_weight={0: 1, 1: 10} and got my True Positive and False
Positive Results

My question is , when I multiply the results obtained from my second test
with a factor of 10, I don't match with the results obtained from my first
test. In other words, say I get the true positive against a threshold from
the second test as 8 , while the true positive from the first test against
the same threshold is 260. I am getting similar observations for the false
positive results wherein if I multiply the results obtained in the second
test by 10, I don't come close to the results obtained from the first set.

Is my expectation correct ? Is my way of executing the test (i.e., reducing
the the positive classes by 10 times and then feeding a class weight of 10
times the negative classes) and comparing the results with a model run
without any class weight parameter correct ?

Please let me know as per your convenience as this will help me a big way
to understand the concept further.

Thanks in advance !

On Sun, Jan 22, 2017 at 1:56 AM, Josh Vredevoogd <cleverless at gmail.com>
wrote:

> The class_weight parameter doesn't behave the way you're expecting.
>
> The value in class_weight is the weight applied to each sample in that
> class - in your example, each class zero sample has weight 0.001 and each
> class one sample has weight 0.999, so each class one samples carries 999
> times the weight of a class zero sample.
>
> If you would like each class one sample to have ten times the weight, you
> would set `class_weight={0: 1, 1: 10}` or `class_weight={0:0.1, 1:1}`
> equivalently.
>
>
> On Sat, Jan 21, 2017 at 10:18 AM, Debabrata Ghosh <mailfordebu at gmail.com>
> wrote:
>
>> Hi All,
>>              Greetings !
>>
>>               I have a very basic question regarding the usage of the
>> parameter class_weight in scikit learn's Random Forest Classifier's fit
>> method.
>>
>>               I have a fairly unbalanced sample and my positive class :
>> negative class ratio is 1:100. In other words, I have a million records
>> corresponding to negative class and 10,000 records corresponding to
>> positive class. I have trained the random forest classifier model using the
>> above record set successfully.
>>
>>               Further, for a different problem, I want to test the
>> parameter class_weight. So, I am setting the class_weight as [0:0.001 ,
>> 1:0.999] and I have tried running my model on the same dataset as mentioned
>> in the above paragraph but with the positive class records reduced to 1000
>> [because now each positive class is given approximately 10 times more
>> weight than a negative class]. However, the model run results are very very
>> different between the 2 runs (with and without class_weight). And I
>> expected a similar run results.
>>
>>                 Would you please be able to let me know where am I
>> getting wrong. I know it's something silly but just want to improve on my
>> concept.
>>
>> Thanks !
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170122/a2140ae5/attachment-0001.html>


More information about the scikit-learn mailing list