[Numpy-discussion] Fwd: Multi-distribution Linux wheels - please test
Matthew Brett
matthew.brett at gmail.com
Mon Feb 8 21:04:18 EST 2016
On Mon, Feb 8, 2016 at 5:26 PM, Nathaniel Smith <njs at pobox.com> wrote:
> On Mon, Feb 8, 2016 at 4:37 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
> [...]
>> I can't replicate the segfault with manylinux wheels and scipy. On
>> the other hand, I get a new test error for numpy from manylinux, scipy
>> from manylinux, like this:
>>
>> $ python -c 'import scipy.linalg; scipy.linalg.test()'
>>
>> ======================================================================
>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>> File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line
>> 197, in runTest
>> self.test(*self.arg)
>> File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/tests/test_decomp.py",
>> line 658, in eigenhproblem_general
>> assert_array_almost_equal(diag2_, ones(diag2_.shape[0]), DIGITS[dtype])
>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>> line 892, in assert_array_almost_equal
>> precision=decimal)
>> File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>> line 713, in assert_array_compare
>> raise AssertionError(msg)
>> AssertionError:
>> Arrays are not almost equal to 4 decimals
>>
>> (mismatch 100.0%)
>> x: array([ 0., 0., 0.], dtype=float32)
>> y: array([ 1., 1., 1.])
>>
>> ----------------------------------------------------------------------
>> Ran 1507 tests in 14.928s
>>
>> FAILED (KNOWNFAIL=4, SKIP=1, failures=1)
>>
>> This is a very odd error, which we don't get when running over a numpy
>> installed from source, linked to ATLAS, and doesn't happen when
>> running the tests via:
>>
>> nosetests /usr/local/lib/python2.7/dist-packages/scipy/linalg
>>
>> So, something about the copy of numpy (linked to openblas) is
>> affecting the results of scipy (also linked to openblas), and only
>> with a particular environment / test order.
>>
>> If you'd like to try and see whether y'all can do a better job of
>> debugging than me:
>>
>> # Run this script inside a docker container started with this incantation:
>> # docker run -ti --rm ubuntu:12.04 /bin/bash
>> apt-get update
>> apt-get install -y python curl
>> apt-get install libpython2.7 # this won't be necessary with next
>> iteration of manylinux wheel builds
>> curl -LO https://bootstrap.pypa.io/get-pip.py
>> python get-pip.py
>> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy nose
>> python -c 'import scipy.linalg; scipy.linalg.test()'
>
> I just tried this and on my laptop it completed without error.
>
> Best guess is that we're dealing with some memory corruption bug
> inside openblas, so it's getting perturbed by things like exactly what
> other calls to openblas have happened (which is different depending on
> whether numpy is linked to openblas), and which core type openblas has
> detected.
>
> On my laptop, which *doesn't* show the problem, running with
> OPENBLAS_VERBOSE=2 says "Core: Haswell".
>
> Guess the next step is checking what core type the failing machines
> use, and running valgrind... anyone have a good valgrind suppressions
> file?
My machine (which does give the failure) gives
Core: Core2
with OPENBLAS_VERBOSE=2
Matthew
More information about the NumPy-Discussion
mailing list