[scikit-learn] GradientBoostingRegressor with training, validation, and test set.
Olga Lyashevska
o.lyashevskaya at gmail.com
Thu Jul 7 04:56:04 EDT 2016
Hi,
I implement GradientBoostingRegressor algorithm. I randomly divide the
dataset into three parts: a training set (50%), a validation set (25%),
and a test set (25%).
I understand that the training set is used for model fitting (1); the
validation set is used for estimation of prediction error for model
selection (2); and, finally, the test set is used for assessment of the
final chosen model (3). However, I am not sure how to implement this.
Can anyone give any examples?
Many thanks,
Olga
X_train, X_test, y_train, y_test = cv.train_test_split(X, y, test_size=0.5)
X_test, X_val, y_test, y_val = cv.train_test_split(X_test, y_test,
test_size=0.5)
params = {'n_estimators': 2000,
'max_depth': 4,
'min_samples_leaf': 4,
'learning_rate': 0.01,
'min_samples_split': 1,
'subsample': 0.75,
'random_state': 42,
'loss': 'ls'}
est = ensemble.GradientBoostingRegressor(**params)
est.fit(X_train, y_train) # 1
mean_squared_error(y_test, est.predict(X_test)) # 3
More information about the scikit-learn
mailing list