[scikit-learn] Understanding max_features parameter in RandomForestClassifier

Venkatachalam N venky.yuvy at gmail.com
Wed Mar 11 04:18:27 EDT 2020


Hi Aditya,

The sampling is done with replacement with the default settings.
Hence, you will get different dataset even though you sample same number
(`X.shape[0]`) of datapoints.

Regards,
Venkatachalam N.



On Wed, Mar 11, 2020 at 11:14 AM aditya aggarwal <
adityaselfefficient at gmail.com> wrote:

> With all the parameters set to default, (especially bootstrap and
> max_samples), no of samples passed to each estimator is X.shape[0]. Doesn't
> it account for all the instances in the dataset with calculated no. of
> feature? Then how come only a subset is given to the estimator?
>
> On Wed, Mar 11, 2020 at 10:58 AM Brown J.B. via scikit-learn <
> scikit-learn at python.org> wrote:
>
>> Regardless of the number of features, each DT estimator is given only a
>> subset of the data.
>> Each DT estimator then uses the features to derive decision rules for the
>> samples it was given.
>> With more trees and few examples, you might get similar or identical
>> trees, but that is not the norm.
>>
>> Pardon brevity.
>> J.B.
>>
>> 2020年3月11日(水) 14:11 aditya aggarwal <adityaselfefficient at gmail.com>:
>>
>>> For RandomForestClassifier in sklearn
>>>
>>> max_features parameter gives the max no of features for split in random
>>> forest which is sqrt(n_features) as default. If m is sqrt of n, then no of
>>> combinations for DT formation is nCm. What if nCm is less than n_estimators
>>> (no of decision trees in random forest)?
>>>
>>> *example:* For n = 7, max_features is 3, so nCm is 35, meaning 35
>>> unique combinations of features for decision trees. Now for n_estimators =
>>> 100, will the remaining 65 trees have repeated combination of features? If
>>> so, won't trees be correlated introducing bias in the answer?
>>>
>>>
>>> Thanks
>>>
>>> Aditya Aggarwal
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200311/3861ac60/attachment.html>


More information about the scikit-learn mailing list