[scikit-learn] Bm25 pull request

Joel Nothman joel.nothman at gmail.com
Mon Jul 11 20:26:54 EDT 2016


CircleCI checks the documentation build (although apparently it ignores
changes only to docstrings). Travis runs all tests on a linux system.
AppVeyor tests on Windows.

On 12 July 2016 at 08:11, Basil Beirouti <basilbeirouti at gmail.com> wrote:

>
> Hi,
>
> Joel thanks for pointing out the indentation issue. I have fixed it.
>
> Can someone explain what the 3 tests that were automatically run on my
> code are? And why did the Appveyor and Travis ones fail?
>
> Sincerely,
> Basil Beirouti
> Sent from my iPhone
>
> > On Jul 11, 2016, at 11:00 AM, scikit-learn-request at python.org wrote:
> >
> > Send scikit-learn mailing list submissions to
> >    scikit-learn at python.org
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >    https://mail.python.org/mailman/listinfo/scikit-learn
> > or, via email, send a message with subject or body 'help' to
> >    scikit-learn-request at python.org
> >
> > You can reach the person managing the list at
> >    scikit-learn-owner at python.org
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of scikit-learn digest..."
> >
> >
> > Today's Topics:
> >
> >   1. Re: Scikit learn GridSearchCV fit method ValueError Found
> >      array with 0 sample (Maciek W?jcikowski)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Mon, 11 Jul 2016 13:33:28 +0200
> > From: Maciek W?jcikowski <maciek at wojcikowski.pl>
> > To: Scikit-learn user and developer mailing list
> >    <scikit-learn at python.org>
> > Subject: Re: [scikit-learn] Scikit learn GridSearchCV fit method
> >    ValueError Found array with 0 sample
> > Message-ID:
> >    <CAH2JJR1BqHC0PzNv7uaugkQ9GDBUTev4yuJ1qOWuJa=eWZ1wnQ at mail.gmail.com>
> > Content-Type: text/plain; charset="utf-8"
> >
> > Shouldn't you pass labels (binary) instead of continuous data? If you
> wish
> > to stick to logK's and keep the distribution unchanged then you'd better
> > reduce the number of classes (eg round the values to nearest integer?).
> >
> > It might be the case that the counts per class are floored and you get 0
> > for some cases.
> >
> > ----
> > Pozdrawiam,  |  Best regards,
> > Maciek W?jcikowski
> > maciek at wojcikowski.pl
> >
> > 2016-07-11 13:16 GMT+02:00 Micha? Nowotka <mmmnow at gmail.com>:
> >
> >> Hi Maciek,
> >>
> >> Thanks for suggestion, I think the problem indeed is related to the
> >> StratifiedKFold because if I use KFold instead the code works fine.
> >> However, if I print StratifiedKFold object it looks fine to me:
> >>
> >> sklearn.cross_validation.StratifiedKFold(labels=[ 5.43  8.74  8.1
> >> 6.55  7.66  6.52  8.6   7.1   6.4   8.05  7.89  6.68
> >>  8.06  6.17  5.5   7.96  5.78  6.    7.74  5.83  6.51  6.31  6.68  9.22
> >>  6.07  7.06  7.12  8.64  5.72  6.4   7.64  5.74  7.41  6.49  6.81  7.1
> >>  7.66  6.68  7.05  6.28  5.49  6.35  6.9   6.2   7.51  5.65  9.3   5.84
> >>  6.92  5.75  6.92  8.8   7.04  5.81  5.73  5.31  7.13  7.66  6.98  5.93
> >>  8.24  6.96  8.22  7.27  7.34  5.91  5.57  6.5   7.28  6.74  4.92  6.88
> >>  5.8   9.15  6.63  6.37  8.66  6.4 ], n_folds=5, shuffle=False,
> >> random_state=None)
> >>
> >>
> >> On Fri, Jul 8, 2016 at 10:42 PM, Maciek W?jcikowski
> >> <maciek at wojcikowski.pl> wrote:
> >>> Hi Micha?,
> >>>
> >>> What are the class counts in that set? Maybe there is a problem with
> >>> generating stratified subsamples (eg some classes get below 1 sample)?
> >>>
> >>> ----
> >>> Pozdrawiam,  |  Best regards,
> >>> Maciek W?jcikowski
> >>> maciek at wojcikowski.pl
> >>>
> >>> 2016-07-08 17:22 GMT+02:00 Micha? Nowotka <mmmnow at gmail.com>:
> >>>>
> >>>> Hi,
> >>>>
> >>>> Sorry for cross posting
> >>>>
> >>>> (
> >>
> http://stackoverflow.com/questions/38263933/scikit-learn-gridsearchcv-fit-method-valueerror-found-array-with-0-sample
> >> )
> >>>> but I don't know where is better to get help with my problem.
> >>>> I'm working on a VM with Jupyter notebook server installed.
> >>>> From time to time I add new notebooks and reevaluate old ones to see
> >>>> if they still work.
> >>>>
> >>>> This notebook stopped working due to some changes in scikit-learn API
> >>>> and some parameters become obsolete:
> >>
> https://github.com/chembl/mychembl/blob/master/ipython_notebooks/10_myChEMBL_machine_learning.ipynb
> >>>>
> >>>> I've created a corrected version of the notebook here:
> >>>>
> >>>> https://gist.github.com/anonymous/676c55cc501ffa48fecfcc1e1252d433
> >>>>
> >>>> But I'm stuck in cell 36 on this code:
> >>>>
> >>>> from sklearn.cross_validation import KFold
> >>>> from sklearn.grid_search import GridSearchCV
> >>>>
> >>>> X_traina, X_testa, y_traina, y_testa =
> >>>> cross_validation.train_test_split(x, y, test_size=0.95,
> >>>> random_state=23)
> >>>>
> >>>> params = {'min_samples_split': [8], 'max_depth': [20],
> >>>> 'min_samples_leaf': [1],'n_estimators':[200]}
> >>>> cv = KFold(n=len(X_traina),n_folds=10,shuffle=True)
> >>>> cv_stratified = StratifiedKFold(y_traina, n_folds=5)
> >>>> gs = GridSearchCV(custom_forest, params,
> >>>> cv=cv_stratified,verbose=1,refit=True)
> >>>> gs.fit(X_traina,y_traina)
> >>>>
> >>>> This gives me:
> >>>>
> >>>> ValueError: Found array with 0 sample(s) (shape=(0, 491)) while a
> >>>> minimum of 1 is required.
> >>>>
> >>>> Now I don't understand this because when I print shapes of the
> samples:
> >>>>
> >>>> print (X_traina.shape, X_testa.shape, y_traina.shape, y_testa.shape)
> >>>>
> >>>> I'm getting:
> >>>>
> >>>> ((78, 491), (1489, 491), (78,), (1489,))
> >>>>
> >>>> Interestingly, if I change the test_size parameter to 0.88 (like in
> >>>> the example corrected notebook) it works and this is the highest value
> >>>> where it works. For this value, the shapes are:
> >>>>
> >>>> ((188, 491), (1379, 491), (188,), (1379,))
> >>>>
> >>>> So the question is - what should I change in my code to make it work
> >>>> for test_size set to 0.95 as well?
> >>>>
> >>>> Kind regards,
> >>>>
> >>>> Michal Nowotka
> >>>> _______________________________________________
> >>>> scikit-learn mailing list
> >>>> scikit-learn at python.org
> >>>> https://mail.python.org/mailman/listinfo/scikit-learn
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> scikit-learn mailing list
> >>> scikit-learn at python.org
> >>> https://mail.python.org/mailman/listinfo/scikit-learn
> >> _______________________________________________
> >> scikit-learn mailing list
> >> scikit-learn at python.org
> >> https://mail.python.org/mailman/listinfo/scikit-learn
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: <
> http://mail.python.org/pipermail/scikit-learn/attachments/20160711/d66aa81c/attachment-0001.html
> >
> >
> > ------------------------------
> >
> > Subject: Digest Footer
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> >
> >
> > ------------------------------
> >
> > End of scikit-learn Digest, Vol 4, Issue 15
> > *******************************************
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160712/68e97d77/attachment-0001.html>


More information about the scikit-learn mailing list