[scikit-learn] Random Forest with Bootstrapping

Mon Oct 3 14:34:04 EDT 2016

Hi,

>From docs
http://scikit-learn.org/stable/auto_examples/ensemble/plot_ensemble_oob.html
:

The RandomForestClassifier is trained using bootstrap aggregation, where
each new tree is fit from a bootstrap sample of the training observations
z_i = (x_i, y_i). The out-of-bag (OOB) error is the average error for each
z_i calculated using predictions from the trees that do not contain z_i in
their respective bootstrap sample. This allows the RandomForestClassifier
to be fit and validated whilst being trained [1].

If you get samples with replacements, then you have a high chance for some
of z_i not to be included in the training phase of a tree. Then this tree
will be involved in estimation of OOB error for z_i. I hope it makes a
little bit clearer.

2016-10-03 19:25 GMT+01:00 Ibrahim Dalal via scikit-learn <
scikit-learn at python.org>:

> Dear Developers,
>
> From whatever little knowledge I gained last night about Random Forests,
> each tree is trained with a sub-sample of original dataset (usually with
> replacement)?.
>
> (Note: Please do correct me if I am not making any sense.)
>
> RandomForestClassifier has an option of 'bootstrap'. The API states the
> following
>
>
>> The sub-sample size is always the same as the original input sample size
>> but the samples are drawn with replacement if bootstrap=True (default).
>
>
> Now, what I am not able to understand is - if entire dataset is used to
> train each of the trees, then how does the classifier estimates the OOB
> error? None of the entries of the dataset is an oob for any of the trees.
> (Pardon me if all this sounds BS)
>
> Help this mere mortal.
>
> Thanks
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>

-- 
Yours sincerely,
https://www.linkedin.com/in/alexey-dral
Alexey A. Dral
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161003/c6f63312/attachment.html>