[scikit-learn] Understanding max_features parameter in RandomForestClassifier

aditya aggarwal adityaselfefficient at gmail.com
Wed Mar 11 01:10:10 EDT 2020


For RandomForestClassifier in sklearn

max_features parameter gives the max no of features for split in random
forest which is sqrt(n_features) as default. If m is sqrt of n, then no of
combinations for DT formation is nCm. What if nCm is less than n_estimators
(no of decision trees in random forest)?

*example:* For n = 7, max_features is 3, so nCm is 35, meaning 35 unique
combinations of features for decision trees. Now for n_estimators = 100,
will the remaining 65 trees have repeated combination of features? If so,
won't trees be correlated introducing bias in the answer?


Thanks

Aditya Aggarwal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200311/d08c2d8e/attachment.html>


More information about the scikit-learn mailing list