[scikit-learn] Understanding max_features parameter in RandomForestClassifier

aditya aggarwal adityaselfefficient at gmail.com
Wed Mar 11 01:43:02 EDT 2020


With all the parameters set to default, (especially bootstrap and
max_samples), no of samples passed to each estimator is X.shape[0]. Doesn't
it account for all the instances in the dataset with calculated no. of
feature? Then how come only a subset is given to the estimator?

On Wed, Mar 11, 2020 at 10:58 AM Brown J.B. via scikit-learn <
scikit-learn at python.org> wrote:

> Regardless of the number of features, each DT estimator is given only a
> subset of the data.
> Each DT estimator then uses the features to derive decision rules for the
> samples it was given.
> With more trees and few examples, you might get similar or identical
> trees, but that is not the norm.
>
> Pardon brevity.
> J.B.
>
> 2020年3月11日(水) 14:11 aditya aggarwal <adityaselfefficient at gmail.com>:
>
>> For RandomForestClassifier in sklearn
>>
>> max_features parameter gives the max no of features for split in random
>> forest which is sqrt(n_features) as default. If m is sqrt of n, then no of
>> combinations for DT formation is nCm. What if nCm is less than n_estimators
>> (no of decision trees in random forest)?
>>
>> *example:* For n = 7, max_features is 3, so nCm is 35, meaning 35 unique
>> combinations of features for decision trees. Now for n_estimators = 100,
>> will the remaining 65 trees have repeated combination of features? If so,
>> won't trees be correlated introducing bias in the answer?
>>
>>
>> Thanks
>>
>> Aditya Aggarwal
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200311/194583ed/attachment-0001.html>


More information about the scikit-learn mailing list