[scikit-learn] Weighted Random Forest vs. "class_weight" in RandomForestClassifier

Kristen M. Altenburger kaltenb at stanford.edu
Wed Oct 16 19:34:59 EDT 2019


Hi All,

Posted the same question on StackExchange [link<https://stats.stackexchange.com/questions/431777/class-weight-in-random-forest-vs-breimans-weighted-random-forest>] but also circulating here to see if someone knows :)


I am confused whether the "class_weight" parameter in Python's sklearn's Random Forest Classifier (https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) is equivalent to Chen/Breiman's notion of "Weighted Random Forest" described in Section 2.3 (https://statistics.berkeley.edu/sites/default/files/tech-reports/666.pdf). In short, "Weighted Random Forest" will "...assign a weight to each class, with the minority class given larger weight (i.e., higher misclassification cost). The class weights are incorporated into the RF algorithm in two places. In the tree induction procedure, class weights are used to weight the Gini criterion for finding splits. In the terminal nodes of each tree, class weights are again taken into consideration. The class prediction of each terminal node is determined by “weighted majority vote”; i.e., the weighted vote of a class is the weight for that class times the number of cases for that class at the terminal node. The final class prediction for RF is then determined by aggregatting the weighted vote from each individual tree, where the weights are average weights in the terminal nodes."

Question: I can't tell from the Python source code for RandomForestClassifier, is class_weight used to weight the Gini criterion for finding splits? And if not, can anyone recommend code that implements Weighted Random Forest? Thanks!

Thanks!
Kristen
http://kaltenburger.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20191016/3c513fbb/attachment.html>


More information about the scikit-learn mailing list