[scikit-learn] is RandomForest random samples or random features?

Dale T Smith Dale.T.Smith at macys.com
Tue Sep 13 08:23:34 EDT 2016


Each tree is built using a random sample with replacement from the provided training data. The data not in the sample is used to calculate the out-of-bag score. The “bag” is the sampled data.

The “random” refers to several features of the algorithm, including random sampling of features

So for each tree
                Get a random sample of the training data
                For I to n_estimators:
                                Build a tree – this involves a random sample of features and thresholds for each feature in the sample at each node.
                                Use the rest of the training data, not in the sample, to calculate the out-of-bag score

Random Forest already incorporates “random features”.

https://github.com/glouppe/phd-thesis

__________________________________________________________________________________________
Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science
770-658-5176 | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith at macys.com

From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On Behalf Of ??
Sent: Tuesday, September 13, 2016 4:16 AM
To: scikit-learn at python.org
Subject: [scikit-learn] is RandomForest random samples or random features?

⚠ EXT MSG:
I have read the Guide of sklearn's RandomForest :

"""
In random forests (see RandomForestClassifier<http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier> and RandomForestRegressor<http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html#sklearn.ensemble.RandomForestRegressor> classes), each tree in the ensemble is built from a sample drawn with replacement (i.e., a bootstrap sample) from the training set.
"""
But I prefer RandomForest as :
"""
features ("attributes", "predictors", "independent variables") are randomly sampled
"""
is RandomForest random samples or random features? where can I find a features random version of RandomForest?
thx.
* This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160913/f421e601/attachment.html>


More information about the scikit-learn mailing list