<div dir="ltr"><div><div>Dear community,<br><br></div>This is a question regarding how to interpret the documentation and semantics of the random forest constructors.<br><br></div><div>In forest.py (of version 0.17 which I am still using), the documentation regarding the number of features to consider states on lines 742-745 of the source code that the search may effectively inspect more than `max_features` when determining the features to pick from in order to split a node.<br></div><div>It also states that it is tree specific.<br><br></div><div>Am I correct in:<br><br></div><div>Interpretation #1 - For bootstrap=True, sampling with replacement occurs for the number of training instances available, meaning that the subsample presented to a particular tree will have some probability of containing overlaps and therefore not the full input training set, but for bootstrap=False, the entire dataset will be presented to each tree?<br><br></div><div>Interpretation #2 - Particularly, with the way I interpret the documentation stating that "The sub-sample size is always the same as the original
input sample size...", it seems to me that bootstrap=False then provides the entire training dataset to each decision tree, and it is a matter of which feature was randomly selected first from the features given that determines what the tree will become.<br></div><div>That would suggest that, if bootstrap=False, and if the number of trees is high but the feature dimensionality is very low, then there is a high possibility that multiple copies of the same tree will emerge from the forest.<br><br>Interpretation #3 - the feature subset is not subsampled per tree, but
rather all features are presented for the subsampled training data
provided to a tree ? For example, if the dimensionality is 400 on a
6000-input training dataset that has randomly been subsampled (with bootstrap=True) to yield 4700 unique training samples,
then the tree builder will consider all 400 dimensions/features with
respect to the 4700 samples, picking at most `max_features` number of
features (out of 400) for building splits in the tree? So by default
(sqrt/auto), there would be at most 20 splits in the tree?<br><br></div><div>Confirmations, denials, and corrections to my interpretations are _highly_ welcome.<br></div><div><br></div><div>As always, my great thanks to the community.<br></div><div><br></div><div>With kind regards,<br></div><div>J.B. Brown<br></div><div>Kyoto University Graduate School of Medicine<br></div></div>