[Numpy-discussion] Masking through generator arrays

Mark Wiebe mwwiebe at gmail.com
Thu May 10 18:34:02 EDT 2012


On Thu, May 10, 2012 at 5:27 PM, Dag Sverre Seljebotn <
d.s.seljebotn at astro.uio.no> wrote:

> On 05/10/2012 08:23 PM, Chris Barker wrote:
> > On Thu, May 10, 2012 at 2:38 AM, Dag Sverre Seljebotn
> > <d.s.seljebotn at astro.uio.no>  wrote:
> >> What would serve me? I use NumPy as a glorified "double*".
> >
> >> all I want is my glorified
> >> "double*". I'm probably not a representative user.)
> >
> > Actually, I think you are representative of a LOT of users -- it
> > turns, out, whether Jim Huginin originally was thinking this way or
> > not, but numpy arrays are really powerful because the provide BOTH and
> > nifty, full featured array object in Python, AND a wrapper around a
> > generic "double*" (actually char*, that could be any type).
> >
> > This is are really widely used feature, and has become even more so
> > with Cython's numpy support.
> >
> > That is one of my concerns about the "bit pattern" idea -- we've then
> > created a new binary type that no other standard software understands
> > -- that looks like a a lot of work to me to deal with, or even worse,
> > ripe for weird, non-obvious errors in code that access that good-old
> > char*.
> >
> > So I'm happier with a mask implementation -- more memory, yes, but it
> > seems more robust an easy to deal with with outside code.
>
> It's very interesting that you consider masks easier to integrate with
> C/C++ code than bitpatterns. I guess everybody's experience (and every
> C/C++/Fortran code base) is different.
>
> >
> > But either way, Dag's key point is right on -- in Cython (or any other
> > code) -- we need to make sure ti's easy to get a regular old pointer
> > to a regular old C array, and get something else by accident.
>
> I'm sorry if I caused any confusion -- I didn't mean to suggest that
> anybody would ever remove the ability of getting a pointer to an
> unmasked array.
>
> There is a problem that's being discussed of the opposite nature:
>
> With masked arrays, the current situation in NumPy trunk is that if
> you're presented with a masked array, and do not explicitly check for a
> mask (i.e., all existing code), you'll transparently and without warning
> "unmask" it -- that is, an element has the last value before NA was
> assigned. This is the case whether you use PEP 3118 (np.ndarray[double]
> or double[:]), or PyArray_DATA.
>
> According to the NEP, you should really get an exception when accessing
> through PEP 3118, but this seems to not be implemented. I don't know
> whether this was a conscious change or a lack of implementation (?).
>

This was an error, I've made a pull request to fix it.


> PyArray_DATA will continue to transparently unmask data. However, with
> Travis' proposal of making a new 'ndmasked' type, old code will be
> protected; it will raise an exception for masked arrays instead of
> transparently unmasking, giving the user a chance to work around it (or
> update the code to work with masks).
>

In searching for example code, the examples I found and the numpy
documentation recommend using the PyArray_FromAny or related functions to
sanitize the array before use. This provides a place to stop NA-masked
arrays and raise an exception. Is there a lot of code out there which isn't
following this practice?

Cheers,
Mark


> Regarding new code that you write to be mask-aware, fear not -- you can
> use PyArray_DATA and PyArray_MASKNA_DATA to get the pointers. You can't
> really access the mask using np.ndarray[uint8] or uint8[:], but it
> wouldn't be a problem for NumPy to provide such access for Cython users.
>
> Regarding native Cython support for masks, bitpatterns would be a quick
> job and an uncontroversial feature, we just need to agree on an
> extension to the PEP 3118 format string with NumPy and then it takes a
> few hours to implement it. Masks would require quite some hashing out on
> the Cython email list to figure out whether and how we would want to
> support it, and is quite some more development work as well. How we'd
> even do that is much more vague to me.
>
> Dag
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120510/22e44f40/attachment.html>


More information about the NumPy-Discussion mailing list