[scikit-learn] Random Forest with Bootstrapping

Алексей Драль aadral at gmail.com
Mon Oct 3 14:34:04 EDT 2016


>From docs

The RandomForestClassifier is trained using bootstrap aggregation, where
each new tree is fit from a bootstrap sample of the training observations
z_i = (x_i, y_i). The out-of-bag (OOB) error is the average error for each
z_i calculated using predictions from the trees that do not contain z_i in
their respective bootstrap sample. This allows the RandomForestClassifier
to be fit and validated whilst being trained [1].

If you get samples with replacements, then you have a high chance for some
of z_i not to be included in the training phase of a tree. Then this tree
will be involved in estimation of OOB error for z_i. I hope it makes a
little bit clearer.

2016-10-03 19:25 GMT+01:00 Ibrahim Dalal via scikit-learn <
scikit-learn at python.org>:

> Dear Developers,
> From whatever little knowledge I gained last night about Random Forests,
> each tree is trained with a sub-sample of original dataset (usually with
> replacement)?.
> (Note: Please do correct me if I am not making any sense.)
> RandomForestClassifier has an option of 'bootstrap'. The API states the
> following
>> The sub-sample size is always the same as the original input sample size
>> but the samples are drawn with replacement if bootstrap=True (default).
> Now, what I am not able to understand is - if entire dataset is used to
> train each of the trees, then how does the classifier estimates the OOB
> error? None of the entries of the dataset is an oob for any of the trees.
> (Pardon me if all this sounds BS)
> Help this mere mortal.
> Thanks
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

Yours sincerely,
Alexey A. Dral
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161003/c6f63312/attachment.html>

More information about the scikit-learn mailing list