[scikit-learn] Splitting Method on RandomForestClassifier
mail at sebastianraschka.com
Tue Oct 2 14:23:00 EDT 2018
This is explained here
"In addition, when splitting a node during the construction of the tree, the split that is chosen is no longer the best split among all features. Instead, the split that is picked is the best split among a random subset of the features."
and the "best split" (in the decision trees) among the random feature subset is based on maximizing information gain or equivalently minimizing child node impurity as described here: http://scikit-learn.org/stable/modules/tree.html#mathematical-formulation
Looking at this, I have a question though ...
In the docs (http://scikit-learn.org/stable/modules/tree.html#mathematical-formulation) it says
"Select the parameters that minimises the impurity"
"Recurse for subsets Q_left and Q_right until the maximum allowable depth is reached"
So but this is basically not the whole definition, right? There should be condition that if the weighted average of the child node impurities for any given feature is not smaller than the parent node impurity, the tree growing algorithm would terminate, right?
> On Oct 2, 2018, at 10:49 AM, Guillaume Lemaître <g.lemaitre58 at gmail.com> wrote:
> In Random Forest, the best split for each feature is selected. The
> Extra Randomized Trees will make a random split instead.
> On Tue, 2 Oct 2018 at 17:43, Michael Reupold
> <michael_reupold at trimble.com> wrote:
>> Hello all,
>> I currently struggle to find information what or which specific split Methods are used on the RandomForestClassifier. Is it a random selection? A median? The best of a set of methods?
>> Kind regards
>> Michael Reupold
>> scikit-learn mailing list
>> scikit-learn at python.org
> Guillaume Lemaitre
> INRIA Saclay - Parietal team
> Center for Data Science Paris-Saclay
> scikit-learn mailing list
> scikit-learn at python.org
More information about the scikit-learn