[scikit-learn] Understanding max_features parameter in RandomForestClassifier

Wed Mar 11 01:26:50 EDT 2020

Regardless of the number of features, each DT estimator is given only a
subset of the data.
Each DT estimator then uses the features to derive decision rules for the
samples it was given.
With more trees and few examples, you might get similar or identical trees,
but that is not the norm.

Pardon brevity.
J.B.

2020年3月11日(水) 14:11 aditya aggarwal <adityaselfefficient at gmail.com>:

> For RandomForestClassifier in sklearn
>
> max_features parameter gives the max no of features for split in random
> forest which is sqrt(n_features) as default. If m is sqrt of n, then no of
> combinations for DT formation is nCm. What if nCm is less than n_estimators
> (no of decision trees in random forest)?
>
> *example:* For n = 7, max_features is 3, so nCm is 35, meaning 35 unique
> combinations of features for decision trees. Now for n_estimators = 100,
> will the remaining 65 trees have repeated combination of features? If so,
> won't trees be correlated introducing bias in the answer?
>
>
> Thanks
>
> Aditya Aggarwal
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200311/594fea6b/attachment.html>