[scikit-learn] Scikit learn GridSearchCV fit method ValueError Found array with 0 sample
Maciek Wójcikowski
maciek at wojcikowski.pl
Fri Jul 8 17:42:06 EDT 2016
Hi Michał,
What are the class counts in that set? Maybe there is a problem with
generating stratified subsamples (eg some classes get below 1 sample)?
----
Pozdrawiam, | Best regards,
Maciek Wójcikowski
maciek at wojcikowski.pl
2016-07-08 17:22 GMT+02:00 Michał Nowotka <mmmnow at gmail.com>:
> Hi,
>
> Sorry for cross posting
> (
> http://stackoverflow.com/questions/38263933/scikit-learn-gridsearchcv-fit-method-valueerror-found-array-with-0-sample
> )
> but I don't know where is better to get help with my problem.
> I'm working on a VM with Jupyter notebook server installed.
> From time to time I add new notebooks and reevaluate old ones to see
> if they still work.
>
> This notebook stopped working due to some changes in scikit-learn API
> and some parameters become obsolete:
>
>
> https://github.com/chembl/mychembl/blob/master/ipython_notebooks/10_myChEMBL_machine_learning.ipynb
>
> I've created a corrected version of the notebook here:
>
> https://gist.github.com/anonymous/676c55cc501ffa48fecfcc1e1252d433
>
> But I'm stuck in cell 36 on this code:
>
> from sklearn.cross_validation import KFold
> from sklearn.grid_search import GridSearchCV
>
> X_traina, X_testa, y_traina, y_testa =
> cross_validation.train_test_split(x, y, test_size=0.95,
> random_state=23)
>
> params = {'min_samples_split': [8], 'max_depth': [20],
> 'min_samples_leaf': [1],'n_estimators':[200]}
> cv = KFold(n=len(X_traina),n_folds=10,shuffle=True)
> cv_stratified = StratifiedKFold(y_traina, n_folds=5)
> gs = GridSearchCV(custom_forest, params,
> cv=cv_stratified,verbose=1,refit=True)
> gs.fit(X_traina,y_traina)
>
> This gives me:
>
> ValueError: Found array with 0 sample(s) (shape=(0, 491)) while a
> minimum of 1 is required.
>
> Now I don't understand this because when I print shapes of the samples:
>
> print (X_traina.shape, X_testa.shape, y_traina.shape, y_testa.shape)
>
> I'm getting:
>
> ((78, 491), (1489, 491), (78,), (1489,))
>
> Interestingly, if I change the test_size parameter to 0.88 (like in
> the example corrected notebook) it works and this is the highest value
> where it works. For this value, the shapes are:
>
> ((188, 491), (1379, 491), (188,), (1379,))
>
> So the question is - what should I change in my code to make it work
> for test_size set to 0.95 as well?
>
> Kind regards,
>
> Michal Nowotka
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160708/0ce8659a/attachment.html>
More information about the scikit-learn
mailing list