[scikit-learn] GradientBoostingRegressor with training, validation, and test set.

Thu Jul 7 04:56:04 EDT 2016

Hi,

I implement GradientBoostingRegressor algorithm. I randomly divide the
dataset into three parts: a training set (50%), a validation set (25%),
and a test set (25%).

I understand that the training set is used for model fitting (1); the
validation set is used for estimation of prediction error for model
selection (2); and, finally, the test set is used for assessment of the
final chosen model (3). However, I am not sure how to implement this.
Can anyone give any examples?

Many thanks,
Olga

X_train, X_test, y_train, y_test = cv.train_test_split(X, y, test_size=0.5)
X_test, X_val, y_test, y_val = cv.train_test_split(X_test, y_test,
test_size=0.5)

params = {'n_estimators': 2000,
          'max_depth': 4,
          'min_samples_leaf': 4,
          'learning_rate': 0.01,
          'min_samples_split': 1,
          'subsample': 0.75,
          'random_state': 42,
          'loss': 'ls'}

est = ensemble.GradientBoostingRegressor(**params)

est.fit(X_train, y_train) # 1

mean_squared_error(y_test, est.predict(X_test)) # 3