[Numpy-discussion] Fwd: Multi-distribution Linux wheels - please test

Nathaniel Smith njs at pobox.com
Mon Feb 8 21:07:04 EST 2016


On Mon, Feb 8, 2016 at 6:04 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
> On Mon, Feb 8, 2016 at 5:26 PM, Nathaniel Smith <njs at pobox.com> wrote:
>> On Mon, Feb 8, 2016 at 4:37 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>> [...]
>>> I can't replicate the segfault with manylinux wheels and scipy.  On
>>> the other hand, I get a new test error for numpy from manylinux, scipy
>>> from manylinux, like this:
>>>
>>> $ python -c 'import scipy.linalg; scipy.linalg.test()'
>>>
>>> ======================================================================
>>> FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
>>> ----------------------------------------------------------------------
>>> Traceback (most recent call last):
>>>   File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line
>>> 197, in runTest
>>>     self.test(*self.arg)
>>>   File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/tests/test_decomp.py",
>>> line 658, in eigenhproblem_general
>>>     assert_array_almost_equal(diag2_, ones(diag2_.shape[0]), DIGITS[dtype])
>>>   File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>>> line 892, in assert_array_almost_equal
>>>     precision=decimal)
>>>   File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py",
>>> line 713, in assert_array_compare
>>>     raise AssertionError(msg)
>>> AssertionError:
>>> Arrays are not almost equal to 4 decimals
>>>
>>> (mismatch 100.0%)
>>>  x: array([ 0.,  0.,  0.], dtype=float32)
>>>  y: array([ 1.,  1.,  1.])
>>>
>>> ----------------------------------------------------------------------
>>> Ran 1507 tests in 14.928s
>>>
>>> FAILED (KNOWNFAIL=4, SKIP=1, failures=1)
>>>
>>> This is a very odd error, which we don't get when running over a numpy
>>> installed from source, linked to ATLAS, and doesn't happen when
>>> running the tests via:
>>>
>>> nosetests /usr/local/lib/python2.7/dist-packages/scipy/linalg
>>>
>>> So, something about the copy of numpy (linked to openblas) is
>>> affecting the results of scipy (also linked to openblas), and only
>>> with a particular environment / test order.
>>>
>>> If you'd like to try and see whether y'all can do a better job of
>>> debugging than me:
>>>
>>> # Run this script inside a docker container started with this incantation:
>>> # docker run -ti --rm ubuntu:12.04 /bin/bash
>>> apt-get update
>>> apt-get install -y python curl
>>> apt-get install libpython2.7  # this won't be necessary with next
>>> iteration of manylinux wheel builds
>>> curl -LO https://bootstrap.pypa.io/get-pip.py
>>> python get-pip.py
>>> pip install -f https://nipy.bic.berkeley.edu/manylinux numpy scipy nose
>>> python -c 'import scipy.linalg; scipy.linalg.test()'
>>
>> I just tried this and on my laptop it completed without error.
>>
>> Best guess is that we're dealing with some memory corruption bug
>> inside openblas, so it's getting perturbed by things like exactly what
>> other calls to openblas have happened (which is different depending on
>> whether numpy is linked to openblas), and which core type openblas has
>> detected.
>>
>> On my laptop, which *doesn't* show the problem, running with
>> OPENBLAS_VERBOSE=2 says "Core: Haswell".
>>
>> Guess the next step is checking what core type the failing machines
>> use, and running valgrind... anyone have a good valgrind suppressions
>> file?
>
> My machine (which does give the failure) gives
>
> Core: Core2
>
> with OPENBLAS_VERBOSE=2

Yep, that allows me to reproduce it:

root at f7153f0cc841:/# OPENBLAS_VERBOSE=2 OPENBLAS_CORETYPE=Core2 python
-c 'import scipy.linalg; scipy.linalg.test()'
Core: Core2
[...]
======================================================================
FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4))
----------------------------------------------------------------------
[...]

So this is indeed sounding like an OpenBLAS issue... next stop
valgrind, I guess :-/

-- 
Nathaniel J. Smith -- https://vorpus.org



More information about the NumPy-Discussion mailing list