[scikit-learn] Random Forest with Mean Absolute Error

Robert Slater rdslater at gmail.com
Sun Oct 23 18:37:30 EDT 2016


I searched the archives to see if this was a known issue, but could not
seem to find anyone else having the problem.

Nevertheless, in the latest version (0.18) Random Forest Regressor has the
new option of 'mae' for criterion.  However it appears to run
disporportinally longer than the 'mse' critera,

For example:

from sklearn.ensemble import RandomForestRegressor
rf_tree=50
rf_depth=5
rf=RandomForestRegressor(n_estimators=rf_tree, criterion='mae',
max_depth=rf_depth,
                         min_samples_split=4, min_samples_leaf=2,
max_features=0.5,
                         max_leaf_nodes=5,
                         oob_score=True, n_jobs=1, random_state=0,
verbose=1)

from sklearn.ensemble import ExtraTreesRegressor
et_tree=100
et=ExtraTreesRegressor(n_estimators=et_tree,max_depth=5,min_samples_split=4,
min_samples_leaf=2,max_features=0.5,verbose=1,n_jobs=4)

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
X_train, X_test, y_train, y_test = train_test_split(train, loss,
test_size=0.2, random_state=42)

et.fit(X_train,y_train)
rf.fit(X_train,y_train)

rf_pred=rf.predict(X_test)
et_pred=et.predict(X_test)

print(mean_absolute_error(y_test,rf_pred))
print(mean_absolute_error(y_test,et_pred))

I was using these two for a recent Kaggle competition.  If I use
"criterion='mse'" in the Random forest it takes around 1 min to build.
Switching to 'mae' causes 100% CPU usage and 30 minutes (at least) if wait
time before I kill my kernel.

Not sure if the problem is on my end or if there is a real issue so I
wanted to reach out and see if there or others.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161023/49ab3259/attachment.html>


More information about the scikit-learn mailing list