[Numpy-discussion] Fwd: Multi-distribution Linux wheels - please test

Nathaniel Smith njs at pobox.com
Mon Feb 8 20:26:49 EST 2016


On Mon, Feb 8, 2016 at 4:37 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
[...]
> I can't replicate the segfault with manylinux wheels and scipy.  On
> the other hand, I get a new test error for numpy from manylinux, scipy
> from manylinux, like this:
>
> $ python -c 'import scipy.linalg; scipy.linalg.test()'
>
> ======================================================================
> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line
> 197, in runTest
>     self.test(*self.arg)
>   File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/tests/test_decomp.py",
> line 658, in eigenhproblem_general
>     assert_array_almost_equal(diag2_, ones(diag2_.shape[0]), DIGITS[dtype])
>   File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
> line 892, in assert_array_almost_equal
>     precision=decimal)
>   File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
> line 713, in assert_array_compare
>     raise AssertionError(msg)
> AssertionError:
> Arrays are not almost equal to 4 decimals
>
> (mismatch 100.0%)
>  x: array([ 0.,  0.,  0.], dtype=float32)
>  y: array([ 1.,  1.,  1.])
>
> ----------------------------------------------------------------------
> Ran 1507 tests in 14.928s
>
> FAILED (KNOWNFAIL=4, SKIP=1, failures=1)
>
> This is a very odd error, which we don't get when running over a numpy
> installed from source, linked to ATLAS, and doesn't happen when
> running the tests via:
>
> nosetests /usr/local/lib/python2.7/dist-packages/scipy/linalg
>
> So, something about the copy of numpy (linked to openblas) is
> affecting the results of scipy (also linked to openblas), and only
> with a particular environment / test order.
>
> If you'd like to try and see whether y'all can do a better job of
> debugging than me:
>
> # Run this script inside a docker container started with this incantation:
> # docker run -ti --rm ubuntu:12.04 /bin/bash
> apt-get update
> apt-get install -y python curl
> apt-get install libpython2.7  # this won't be necessary with next
> iteration of manylinux wheel builds
> curl -LO https://bootstrap.pypa.io/get-pip.py
> python get-pip.py
> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy nose
> python -c 'import scipy.linalg; scipy.linalg.test()'

I just tried this and on my laptop it completed without error.

Best guess is that we're dealing with some memory corruption bug
inside openblas, so it's getting perturbed by things like exactly what
other calls to openblas have happened (which is different depending on
whether numpy is linked to openblas), and which core type openblas has
detected.

On my laptop, which *doesn't* show the problem, running with
OPENBLAS_VERBOSE=2 says "Core: Haswell".

Guess the next step is checking what core type the failing machines
use, and running valgrind... anyone have a good valgrind suppressions
file?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org



More information about the NumPy-Discussion mailing list