[Numpy-discussion] NA masks for NumPy are ready to test

Mark Wiebe mwwiebe at gmail.com
Fri Aug 19 14:12:10 EDT 2011


On Fri, Aug 19, 2011 at 11:07 AM, Charles R Harris <
charlesr.harris at gmail.com> wrote:

>
>
> On Fri, Aug 19, 2011 at 11:55 AM, Bruce Southey <bsouthey at gmail.com>wrote:
>
>> On Fri, Aug 19, 2011 at 10:48 AM, Mark Wiebe <mwwiebe at gmail.com> wrote:
>> > On Fri, Aug 19, 2011 at 7:15 AM, Bruce Southey <bsouthey at gmail.com>
>> wrote:
>> >>
>> >> On 08/18/2011 04:43 PM, Mark Wiebe wrote:
>> >>
>> >> It's taken a lot of changes to get the NA mask support to its current
>> >> point, but the code ready for some testing now. You can read the
>> >> work-in-progress release notes here:
>> >>
>> >>
>> https://github.com/m-paradox/numpy/blob/missingdata/doc/release/2.0.0-notes.rst
>> >> To try it out, check out the missingdata branch from my github account,
>> >> here, and build in the standard way:
>> >> https://github.com/m-paradox/numpy
>> >> The things most important to test are:
>> >> * Confirm that existing code still works correctly. I've tested against
>> >> SciPy and matplotlib.
>> >> * Confirm that the performance of code not using NA masks is the same
>> or
>> >> better.
>> >> * Try to do computations with the NA values, find places they don't
>> work
>> >> yet, and nominate unimplemented functionality important to you to be
>> next on
>> >> the development list. The release notes have a preliminary list of
>> >> implemented/unimplemented functions.
>> >> * Report any crashes, build problems, or unexpected behaviors.
>> >> In addition to adding the NA mask, I've also added features and done a
>> few
>> >> performance changes here and there, like letting reductions like sum
>> take
>> >> lists of axes instead of being a single axis or all of them. These
>> changes
>> >> affect various bugs
>> >> like http://projects.scipy.org/numpy/ticket/1143 and
>> http://projects.scipy.org/numpy/ticket/533.
>> >> Thanks!
>> >> Mark
>> >> Here's a small example run using NAs:
>> >> >>> import numpy as np
>> >> >>> np.__version__
>> >> '2.0.0.dev-8a5e2a1'
>> >> >>> a = np.random.rand(3,3,3)
>> >> >>> a.flags.maskna = True
>> >> >>> a[np.random.rand(3,3,3) < 0.5] = np.NA
>> >> >>> a
>> >> array([[[NA, NA,  0.11511708],
>> >>         [ 0.46661454,  0.47565512, NA],
>> >>         [NA, NA, NA]],
>> >>        [[NA,  0.57860351, NA],
>> >>         [NA, NA,  0.72012669],
>> >>         [ 0.36582123, NA,  0.76289794]],
>> >>        [[ 0.65322748,  0.92794386, NA],
>> >>         [ 0.53745165,  0.97520989,  0.17515083],
>> >>         [ 0.71219688,  0.5184328 ,  0.75802805]]])
>> >> >>> np.mean(a, axis=-1)
>> >> array([[NA, NA, NA],
>> >>        [NA, NA, NA],
>> >>        [NA,  0.56260412,  0.66288591]])
>> >> >>> np.std(a, axis=-1)
>> >> array([[NA, NA, NA],
>> >>        [NA, NA, NA],
>> >>        [NA,  0.32710662,  0.10384331]])
>> >> >>> np.mean(a, axis=-1, skipna=True)
>> >>
>> >>
>> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2474:
>> >> RuntimeWarning: invalid value encountered in true_divide
>> >>   um.true_divide(ret, rcount, out=ret, casting='unsafe')
>> >> array([[ 0.11511708,  0.47113483,         nan],
>> >>        [ 0.57860351,  0.72012669,  0.56435958],
>> >>        [ 0.79058567,  0.56260412,  0.66288591]])
>> >> >>> np.std(a, axis=-1, skipna=True)
>> >>
>> >>
>> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2707:
>> >> RuntimeWarning: invalid value encountered in true_divide
>> >>   um.true_divide(arrmean, rcount, out=arrmean, casting='unsafe')
>> >>
>> >>
>> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2730:
>> >> RuntimeWarning: invalid value encountered in true_divide
>> >>   um.true_divide(ret, rcount, out=ret, casting='unsafe')
>> >> array([[ 0.        ,  0.00452029,         nan],
>> >>        [ 0.        ,  0.        ,  0.19853835],
>> >>        [ 0.13735819,  0.32710662,  0.10384331]])
>> >> >>> np.std(a, axis=(1,2), skipna=True)
>> >> array([ 0.16786895,  0.15498008,  0.23811937])
>> >>
>> >> _______________________________________________
>> >> NumPy-Discussion mailing list
>> >> NumPy-Discussion at scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >>
>> >> Hi,
>> >> That is great news!
>> >> (Python2.x will be another email.)
>> >>
>> >> Python3.1 and Python3.2 failed with building
>> 'multiarraymodule_onefile.o'
>> >> but I could not see any obvious reason.
>> >
>> > I've pushed a change to fix the Python 3 build, it was a use
>> > of Py_TPFLAGS_CHECKTYPES, which is no longer in Python3 but is always
>> > default now. Tested with 3.2.
>> > Thanks!
>> > Mark
>> >
>> >>
>> >> I had removed my build directory and then 'python3 setup.py build' but
>> I
>> >> saw this message:
>> >> Running from numpy source directory.
>> >> numpy/core/setup_common.py:86: MismatchCAPIWarning: API mismatch
>> detected,
>> >> the C API version numbers have to be updated. Current C api version is
>> 6,
>> >> with checksum ef5688af03ffa23dd8e11734f5b69313, but recorded checksum
>> for C
>> >> API version 6 in codegen_dir/cversions.txt is
>> >> e61d5dc51fa1c6459328266e215d6987. If functions were added in the C API,
>> you
>> >> have to update C_API_VERSION  in numpy/core/setup_common.py.
>> >>   MismatchCAPIWarning)
>> >>
>> >> Upstream of the build log is below.
>> >>
>> >> Bruce
>> >>
>> >> In file included from
>> >> numpy/core/src/multiarray/multiarraymodule_onefile.c:53:0:
>> >> numpy/core/src/multiarray/na_singleton.c: At top level:
>> >> numpy/core/src/multiarray/na_singleton.c:708:25: error:
>> >> ‘Py_TPFLAGS_CHECKTYPES’ undeclared here (not in a function)
>> >> numpy/core/src/multiarray/common.c:48:1: warning: ‘_use_default_type’
>> >> defined but not used
>> >> numpy/core/src/multiarray/ctors.h:93:1: warning: ‘_arrays_overlap’
>> >> declared ‘static’ but never defined
>> >> numpy/core/src/multiarray/scalartypes.c.src:2251:1: warning:
>> >> ‘gentype_getsegcount’ defined but not used
>> >> numpy/core/src/multiarray/scalartypes.c.src:2269:1: warning:
>> >> ‘gentype_getcharbuf’ defined but not used
>> >> numpy/core/src/multiarray/mapping.c:110:1: warning: ‘_array_ass_item’
>> >> defined but not used
>> >> numpy/core/src/multiarray/number.c:266:1: warning: ‘array_divide’
>> defined
>> >> but not used
>> >> numpy/core/src/multiarray/number.c:464:1: warning:
>> ‘array_inplace_divide’
>> >> defined but not used
>> >> numpy/core/src/multiarray/buffer.c:25:1: warning: ‘array_getsegcount’
>> >> defined but not used
>> >> numpy/core/src/multiarray/buffer.c:58:1: warning: ‘array_getwritebuf’
>> >> defined but not used
>> >> numpy/core/src/multiarray/buffer.c:71:1: warning: ‘array_getcharbuf’
>> >> defined but not used
>> >> numpy/core/src/multiarray/na_mask.c:681:1: warning:
>> >> ‘PyArray_GetMaskInversionFunction’ defined but not used
>> >> In file included from numpy/core/src/multiarray/scalartypes.c.src:25:0,
>> >>                  from
>> >> numpy/core/src/multiarray/multiarraymodule_onefile.c:10:
>> >> numpy/core/src/multiarray/_datetime.h:9:1: warning: function
>> declaration
>> >> isnât a prototype
>> >> In file included from
>> >> numpy/core/src/multiarray/multiarraymodule_onefile.c:13:0:
>> >> numpy/core/src/multiarray/datetime.c:33:1: warning: function
>> declaration
>> >> isnât a prototype
>> >> In file included from
>> >> numpy/core/src/multiarray/multiarraymodule_onefile.c:17:0:
>> >> numpy/core/src/multiarray/arraytypes.c.src: In function âVOID_getitemâ:
>> >> numpy/core/src/multiarray/arraytypes.c.src:643:9: warning: passing
>> >> argument 2 of âPyArray_SetBaseObjectâ from incompatible pointer type
>> >>
>> >>
>> build/src.linux-x86_64-3.2/numpy/core/include/numpy/__multiarray_api.h:763:12:
>> >> note: expected âstruct PyObject *â but argument is of type âstruct
>> >> PyArrayObject *â
>> >> In file included from
>> >> numpy/core/src/multiarray/multiarraymodule_onefile.c:44:0:
>> >> numpy/core/src/multiarray/nditer_pywrap.c: In function
>> >> ânpyiter_subscriptâ:
>> >> numpy/core/src/multiarray/nditer_pywrap.c:2395:29: warning: passing
>> >> argument 1 of âPySlice_GetIndicesâ from incompatible pointer type
>> >> /usr/local/include/python3.2m/sliceobject.h:38:5: note: expected
>> âstruct
>> >> PyObject *â but argument is of type âstruct PySliceObject *â
>> >> numpy/core/src/multiarray/nditer_pywrap.c: In function
>> >> ânpyiter_ass_subscriptâ:
>> >> numpy/core/src/multiarray/nditer_pywrap.c:2440:29: warning: passing
>> >> argument 1 of âPySlice_GetIndicesâ from incompatible pointer type
>> >> /usr/local/include/python3.2m/sliceobject.h:38:5: note: expected
>> âstruct
>> >> PyObject *â but argument is of type âstruct PySliceObject *â
>> >> In file included from
>> >> numpy/core/src/multiarray/multiarraymodule_onefile.c:53:0:
>> >> numpy/core/src/multiarray/na_singleton.c: At top level:
>> >> numpy/core/src/multiarray/na_singleton.c:708:25: error:
>> >> âPy_TPFLAGS_CHECKTYPESâ undeclared here (not in a function)
>> >> numpy/core/src/multiarray/common.c:48:1: warning: â_use_default_typeâ
>> >> defined but not used
>> >> numpy/core/src/multiarray/ctors.h:93:1: warning: â_arrays_overlapâ
>> >> declared âstaticâ but never defined
>> >> numpy/core/src/multiarray/scalartypes.c.src:2251:1: warning:
>> >> âgentype_getsegcountâ defined but not used
>> >> numpy/core/src/multiarray/scalartypes.c.src:2269:1: warning:
>> >> âgentype_getcharbufâ defined but not used
>> >> numpy/core/src/multiarray/mapping.c:110:1: warning: â_array_ass_itemâ
>> >> defined but not used
>> >> numpy/core/src/multiarray/number.c:266:1: warning: âarray_divideâ
>> defined
>> >> but not used
>> >> numpy/core/src/multiarray/number.c:464:1: warning:
>> âarray_inplace_divideâ
>> >> defined but not used
>> >> numpy/core/src/multiarray/buffer.c:25:1: warning: âarray_getsegcountâ
>> >> defined but not used
>> >> numpy/core/src/multiarray/buffer.c:58:1: warning: âarray_getwritebufâ
>> >> defined but not used
>> >> numpy/core/src/multiarray/buffer.c:71:1: warning: âarray_getcharbufâ
>> >> defined but not used
>> >> numpy/core/src/multiarray/na_mask.c:681:1: warning:
>> >> âPyArray_GetMaskInversionFunctionâ defined but not used
>> >> error: Command "gcc -pthread -DNDEBUG -g -fwrapv -O3 -Wall
>> >> -Wstrict-prototypes -fPIC -Inumpy/core/include
>> >> -Ibuild/src.linux-x86_64-3.2/numpy/core/include/numpy
>> >> -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core
>> >> -Inumpy/core/src/npymath -Inumpy/core/src/multiarray
>> -Inumpy/core/src/umath
>> >> -Inumpy/core/src/npysort -Inumpy/core/include
>> >> -I/usr/local/include/python3.2m
>> >> -Ibuild/src.linux-x86_64-3.2/numpy/core/src/multiarray
>> >> -Ibuild/src.linux-x86_64-3.2/numpy/core/src/umath -c
>> >> numpy/core/src/multiarray/multiarraymodule_onefile.c -o
>> >>
>> build/temp.linux-x86_64-3.2/numpy/core/src/multiarray/multiarraymodule_onefile.o"
>> >> failed with exit status 1
>> >>
>> >>
>> >>
>> >>
>> Thanks for the prompt responses.
>>
>> That fixes the build problem for both Python3.1 and Python3.2.
>>
>> I got some test errors below but I guess you are working on those.
>>
>>
>> Bruce
>>
>>
>>
>> $ python3 -c "import numpy; numpy.test()"
>> Running unit tests for numpy
>> NumPy version 2.0.0.dev-965a5c6
>> NumPy is installed in /usr/lib64/python3.2/site-packages/numpy
>> Python version 3.2 (r32:88445, Feb 21 2011, 21:11:06) [GCC 4.6.0
>> 20110212 (Red Hat 4.6.0-0.7)]
>> nose version 1.0.0
>>
>> ..............S.......EFF.....E............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................K...................................................................................................................................................................................................K..................................................................................................K......................K..........................................................................................................S......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................./usr/lib64/python3.2/site-packages/numpy/lib/format.py:575:
>> ResourceWarning: unclosed file <_io.BufferedReader
>> name='/tmp/tmpfmmo7x'>
>>  mode=mode, offset=offset)
>>
>> ......................................................................................................................................................................................................................../usr/lib64/python3.2/subprocess.py:460:
>> ResourceWarning: unclosed file <_io.BufferedReader name=3>
>>  return Popen(*popenargs, **kwargs).wait()
>> /usr/lib64/python3.2/subprocess.py:460: ResourceWarning: unclosed file
>> <_io.BufferedReader name=8>
>>  return Popen(*popenargs, **kwargs).wait()
>>
>> ....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
>> ======================================================================
>> ERROR: test_datetime_array_str (test_datetime.TestDateTime)
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>>  File
>> "/usr/lib64/python3.2/site-packages/numpy/core/tests/test_datetime.py",
>> line 510, in test_datetime_array_str
>>    assert_equal(str(a), "['2011-03-16' '1920-01-01' '2013-05-19']")
>>  File "/usr/lib64/python3.2/site-packages/numpy/core/numeric.py",
>> line 1400, in array_str
>>    return array2string(a, max_line_width, precision, suppress_small,
>> ' ', "", str)
>>  File "/usr/lib64/python3.2/site-packages/numpy/core/arrayprint.py",
>> line 459, in array2string
>>    separator, prefix, formatter=formatter)
>>  File "/usr/lib64/python3.2/site-packages/numpy/core/arrayprint.py",
>> line 331, in _array2string
>>    _summaryEdgeItems, summary_insert)[:-1]
>>  File "/usr/lib64/python3.2/site-packages/numpy/core/arrayprint.py",
>> line 502, in _formatArray
>>    word = format_function(a[-i]) + separator
>>  File "/usr/lib64/python3.2/site-packages/numpy/core/arrayprint.py",
>> line 770, in __call__
>>    casting=self.casting)
>> TypeError: Cannot create a local timezone-based date string from a
>> NumPy datetime without forcing 'unsafe' casting
>>
>> ======================================================================
>> ERROR: test_datetime_divide (test_datetime.TestDateTime)
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>>  File
>> "/usr/lib64/python3.2/site-packages/numpy/core/tests/test_datetime.py",
>> line 926, in test_datetime_divide
>>    assert_equal(tda / tdb, 6.0 / 9.0)
>> TypeError: internal error: could not find appropriate datetime inner
>> loop in true_divide ufunc
>>
>> ======================================================================
>> FAIL: test_datetime_as_string (test_datetime.TestDateTime)
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>>  File
>> "/usr/lib64/python3.2/site-packages/numpy/core/tests/test_datetime.py",
>> line 1166, in test_datetime_as_string
>>    '1959')
>>  File "/usr/lib64/python3.2/site-packages/numpy/testing/utils.py",
>> line 313, in assert_equal
>>    raise AssertionError(msg)
>> AssertionError:
>> Items are not equal:
>>  ACTUAL: b'1959'
>>  DESIRED: '1959'
>>
>> ======================================================================
>> FAIL: test_datetime_as_string_timezone (test_datetime.TestDateTime)
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>>  File
>> "/usr/lib64/python3.2/site-packages/numpy/core/tests/test_datetime.py",
>> line 1277, in test_datetime_as_string_timezone
>>    '2010-03-15T06:30Z')
>>  File "/usr/lib64/python3.2/site-packages/numpy/testing/utils.py",
>> line 313, in assert_equal
>>    raise AssertionError(msg)
>> AssertionError:
>> Items are not equal:
>>  ACTUAL: b'2010-03-15T06:30Z'
>>  DESIRED: '2010-03-15T06:30Z'
>>
>> ----------------------------------------------------------------------
>> Ran 3063 tests in 37.701s
>>
>> FAILED (KNOWNFAIL=4, SKIP=2, errors=2, failures=2)
>>
>
> The 3.2 test errors aren't new. I'd fix the tests except I'm not sure if
> Mark wants to modify the datetime stuff instead.
>

I left them largely untouched because I found it weird that the 'S' data
type doesn't return strings in Python 3... I guess maybe the
datetime_as_string function should convert to 'U' data type on Python 3
after building the 'S' array to work around this design choice. I'll look at
it after the NA stuff is wrapped up.

-Mark


>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110819/7ce5a7b5/attachment.html>


More information about the NumPy-Discussion mailing list