On Tue, Feb 2, 2016 at 8:45 AM, Pauli Virtanen <pav@iki.fi> wrote:
> 01.02.2016, 23:25, Ralf Gommers kirjoitti:
> [clip]
>> So: it would really help if someone could pick up the automation part of
>> this and improve the stack testing, so the numpy release manager doesn't
>> have to do this.
>
> quick hack: https://github.com/pv/testrig
>
> Not that I'm necessarily volunteering to maintain the setup, though, but
> if it seems useful, move it under numpy org.

That's pretty cool :-). I also was fiddling with a similar idea a bit, though much less fancy... my little script cheats and uses miniconda to fetch pre-built versions of some packages, and then runs the tests against numpy 1.10.2 (as shipped by anaconda) + the numpy master, and does a diff (with a bit of massaging to make things more readable, like summarizing warnings):

    https://travis-ci.org/njsmith/numpy/builds/106865202

Search for "#####" to jump between sections of the output.

Some observations:

testing matplotlib this way doesn't work, b/c they need special test data files that anaconda doesn't ship :-/

scipy:
   one new failure, in test_nanmedian_all_axis
   250 calls to np.testing.rand (wtf), 92 calls to random_integers, 3 uses of datetime64 with timezones. And for some reason the new numpy gives more "invalid value encountered in greater"-type warnings.

astropy:
  two weird failures that hopefully some astropy person will look into; two spurious failures due to over-strict testing of warnings

scikit-learn:
  several new failures: 1 "invalid slice" (?), 2 "OverflowError: value too large to convert to int". No idea what's up with these. Hopefully some scikit-learn person will investigate?
  2 np.ma view warnings, 16 multi-character strings used where "C" or "F" expected, 1514 (!!) calls to random_integers

pandas:
  zero new failures, only new warnings are about NaT, as expected. I guess their whole "running their tests against numpy master" thing works!

statsmodels:
  absolute disaster. 261 new failures, I think mostly because of numpy getting pickier about float->int conversions. Also a few "invalid slice".
  102 np.ma view warnings.

I don't have a great sense of whether the statsmodels breakages are ones that will actually impact users, or if they're just like, 1 bad utility function that only gets used in the test suite. (well, probably not the latter, because they do have different tracebacks). If this is typical though then we may need to back those integer changes out and replace them by a really loud obnoxious warning for a release or two :-/ The other problem here is that statsmodels hasn't done a release since 2014 :-/

-n

--
Nathaniel J. Smith -- https://vorpus.org