[scikit-learn] OOB decision function in RandomForestClassifier

Stephen Jeffrey stephen_j_jeffrey at yahoo.com.au
Fri Mar 9 16:58:49 EST 2018


Hi,
When using RFC on a multiclass problem with a large number of trees, would you expect the prediction for a given sample to match the OOB decision function i.e. should the prediction match the class with the highest OOB value for the given sample, when n_estimators is large?
On my 3-class problem, the oob_decision_function_ for a given sample is
[ 0.31091392  0.2982096   0.39087648]

but the prediction for that sample is the middle class (OOB=0.29), whereas I thought it should have been the last class (which has the higher OOB value of 0.39). 
According to the docs:1. The ensemble prediction is a weighted average of the prediction from each individual tree:In contrast to the original publication [B2001], the scikit-learn implementation combines classifiers by averaging their probabilistic prediction, instead of letting each classifier vote for a single class. (taken from section 1.11.2.1 in 1.11. Ensemble methods — scikit-learn 0.19.1 documentation)2. The OOB values are for a given sample are the fraction of out-of-bag predictions for each class (see http://scikit-learn.org/stable/auto_examples/ensemble/plot_ensemble_oob.html)
I thought the prediction for a given sample would converge to the class with the highest OOB value as the number of trees increases, and consequently thought that I could interpret the OOB values for a given sample as the probability of that sample belonging to the various classes. Is this incorrect?
RegardsSteve

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180309/5a0ec061/attachment.html>


More information about the scikit-learn mailing list