[scikit-learn] Scikit learn GridSearchCV fit method ValueError Found array with 0 sample

Michał Nowotka mmmnow at gmail.com
Fri Jul 8 11:22:05 EDT 2016


Sorry for cross posting
but I don't know where is better to get help with my problem.
I'm working on a VM with Jupyter notebook server installed.
>From time to time I add new notebooks and reevaluate old ones to see
if they still work.

This notebook stopped working due to some changes in scikit-learn API
and some parameters become obsolete:


I've created a corrected version of the notebook here:


But I'm stuck in cell 36 on this code:

from sklearn.cross_validation import KFold
from sklearn.grid_search import GridSearchCV

X_traina, X_testa, y_traina, y_testa =
cross_validation.train_test_split(x, y, test_size=0.95,

params = {'min_samples_split': [8], 'max_depth': [20],
'min_samples_leaf': [1],'n_estimators':[200]}
cv = KFold(n=len(X_traina),n_folds=10,shuffle=True)
cv_stratified = StratifiedKFold(y_traina, n_folds=5)
gs = GridSearchCV(custom_forest, params, cv=cv_stratified,verbose=1,refit=True)

This gives me:

ValueError: Found array with 0 sample(s) (shape=(0, 491)) while a
minimum of 1 is required.

Now I don't understand this because when I print shapes of the samples:

print (X_traina.shape, X_testa.shape, y_traina.shape, y_testa.shape)

I'm getting:

((78, 491), (1489, 491), (78,), (1489,))

Interestingly, if I change the test_size parameter to 0.88 (like in
the example corrected notebook) it works and this is the highest value
where it works. For this value, the shapes are:

((188, 491), (1379, 491), (188,), (1379,))

So the question is - what should I change in my code to make it work
for test_size set to 0.95 as well?

Kind regards,

Michal Nowotka

More information about the scikit-learn mailing list