[scikit-learn] Random Forest Regressor criterion

Sebastian Raschka se.raschka at gmail.com
Thu Aug 31 01:47:44 EDT 2017


regarding MSE minimization vs variance reduction; it's been a few years but I remember that we had a discussion about that, where Gilles Louppe explained that those two are identical when I was confused about the wikipedia equation at https://en.wikipedia.org/wiki/Decision_tree_learning#Variance_reduction (I didn't read carefully and somehow thought that x_i etc was referring to feature columns instead of x being the target variable :P). 

A better resource: I think Gilles also had a page about that in his thesis but I currently can't find the page. The thesis should be accessible from https://arxiv.org/abs/1407.7502 though, and I would recommend taking a look at "3.6.3 Finding the best binary split"  and page 108+ on how it's implemented (if this is still up to date with the current implementation!?). This would probably address all your questions :).


> On Aug 30, 2017, at 5:50 AM, Evans J.R.A. <Jonny.Evans at soton.ac.uk> wrote:
> Hi there,
> I would like to fully understand how the Random Forest Regressor chooses how to split the data at each node.
> I understand that each tree considers a boostrap sample of the training data, and on each split a random subset of features (using max_features) are considered. But among these features, how does the algorithm work out which is the best split to make? I am using the default criterion ‘mse’, but don’t understand the given explanation “equal to variance reduction as feature selection criterion”. Does this mean that for each possible split that could be made, the sum of variances of data in the child nodes is calculated, then the algorithm would use the split with the least sum of variances?
> Kind regards,
> Jonny Evans
> Doctoral Researcher
> Transportation Research Group
> Faculty of Engineering and the Environment
> University of Southampton
> Email: Jonny.Evans at soton.ac.uk
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

More information about the scikit-learn mailing list