[scikit-learn] Scikit learn GridSearchCV fit method ValueError Found array with 0 sample

Maciek Wójcikowski maciek at wojcikowski.pl
Mon Jul 11 07:33:28 EDT 2016


Shouldn't you pass labels (binary) instead of continuous data? If you wish
to stick to logK's and keep the distribution unchanged then you'd better
reduce the number of classes (eg round the values to nearest integer?).

It might be the case that the counts per class are floored and you get 0
for some cases.

----
Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
maciek at wojcikowski.pl

2016-07-11 13:16 GMT+02:00 Michał Nowotka <mmmnow at gmail.com>:

> Hi Maciek,
>
> Thanks for suggestion, I think the problem indeed is related to the
> StratifiedKFold because if I use KFold instead the code works fine.
> However, if I print StratifiedKFold object it looks fine to me:
>
> sklearn.cross_validation.StratifiedKFold(labels=[ 5.43  8.74  8.1
> 6.55  7.66  6.52  8.6   7.1   6.4   8.05  7.89  6.68
>   8.06  6.17  5.5   7.96  5.78  6.    7.74  5.83  6.51  6.31  6.68  9.22
>   6.07  7.06  7.12  8.64  5.72  6.4   7.64  5.74  7.41  6.49  6.81  7.1
>   7.66  6.68  7.05  6.28  5.49  6.35  6.9   6.2   7.51  5.65  9.3   5.84
>   6.92  5.75  6.92  8.8   7.04  5.81  5.73  5.31  7.13  7.66  6.98  5.93
>   8.24  6.96  8.22  7.27  7.34  5.91  5.57  6.5   7.28  6.74  4.92  6.88
>   5.8   9.15  6.63  6.37  8.66  6.4 ], n_folds=5, shuffle=False,
> random_state=None)
>
>
> On Fri, Jul 8, 2016 at 10:42 PM, Maciek Wójcikowski
> <maciek at wojcikowski.pl> wrote:
> > Hi Michał,
> >
> > What are the class counts in that set? Maybe there is a problem with
> > generating stratified subsamples (eg some classes get below 1 sample)?
> >
> > ----
> > Pozdrawiam,  |  Best regards,
> > Maciek Wójcikowski
> > maciek at wojcikowski.pl
> >
> > 2016-07-08 17:22 GMT+02:00 Michał Nowotka <mmmnow at gmail.com>:
> >>
> >> Hi,
> >>
> >> Sorry for cross posting
> >>
> >> (
> http://stackoverflow.com/questions/38263933/scikit-learn-gridsearchcv-fit-method-valueerror-found-array-with-0-sample
> )
> >> but I don't know where is better to get help with my problem.
> >> I'm working on a VM with Jupyter notebook server installed.
> >> From time to time I add new notebooks and reevaluate old ones to see
> >> if they still work.
> >>
> >> This notebook stopped working due to some changes in scikit-learn API
> >> and some parameters become obsolete:
> >>
> >>
> >>
> https://github.com/chembl/mychembl/blob/master/ipython_notebooks/10_myChEMBL_machine_learning.ipynb
> >>
> >> I've created a corrected version of the notebook here:
> >>
> >> https://gist.github.com/anonymous/676c55cc501ffa48fecfcc1e1252d433
> >>
> >> But I'm stuck in cell 36 on this code:
> >>
> >> from sklearn.cross_validation import KFold
> >> from sklearn.grid_search import GridSearchCV
> >>
> >> X_traina, X_testa, y_traina, y_testa =
> >> cross_validation.train_test_split(x, y, test_size=0.95,
> >> random_state=23)
> >>
> >> params = {'min_samples_split': [8], 'max_depth': [20],
> >> 'min_samples_leaf': [1],'n_estimators':[200]}
> >> cv = KFold(n=len(X_traina),n_folds=10,shuffle=True)
> >> cv_stratified = StratifiedKFold(y_traina, n_folds=5)
> >> gs = GridSearchCV(custom_forest, params,
> >> cv=cv_stratified,verbose=1,refit=True)
> >> gs.fit(X_traina,y_traina)
> >>
> >> This gives me:
> >>
> >> ValueError: Found array with 0 sample(s) (shape=(0, 491)) while a
> >> minimum of 1 is required.
> >>
> >> Now I don't understand this because when I print shapes of the samples:
> >>
> >> print (X_traina.shape, X_testa.shape, y_traina.shape, y_testa.shape)
> >>
> >> I'm getting:
> >>
> >> ((78, 491), (1489, 491), (78,), (1489,))
> >>
> >> Interestingly, if I change the test_size parameter to 0.88 (like in
> >> the example corrected notebook) it works and this is the highest value
> >> where it works. For this value, the shapes are:
> >>
> >> ((188, 491), (1379, 491), (188,), (1379,))
> >>
> >> So the question is - what should I change in my code to make it work
> >> for test_size set to 0.95 as well?
> >>
> >> Kind regards,
> >>
> >> Michal Nowotka
> >> _______________________________________________
> >> scikit-learn mailing list
> >> scikit-learn at python.org
> >> https://mail.python.org/mailman/listinfo/scikit-learn
> >
> >
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> >
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160711/d66aa81c/attachment.html>


More information about the scikit-learn mailing list