[scikit-learn] Why the default max_samples of Random Forest is X.shape[0]?
Fernando Marcos Wittmann
fernando.wittmann at gmail.com
Fri May 8 17:04:14 EDT 2020
When reading the documentation of Random Forest, I got the following:
```
max_samples : int or float, default=None If bootstrap is True, the number
of samples to draw from X to train each base estimator. - *If None
(default), then draw `X.shape[0]` samples.* - If int, then draw
`max_samples` samples. - If float, then draw `max_samples * X.shape[0]`
samples. Thus, `max_samples` should be in the interval `(0, 1)`.
```
Why does the whole dataset (i.e. X.shape[0] samples from X) is used to
build each tree? That would be equivalent to bootstrap to be False, right?
Wouldn't it be better practices to use as default 2/3 of the size of the
dataset?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200508/4ab88b1d/attachment.html>
More information about the scikit-learn
mailing list