[Numpy-discussion] Masking through generator arrays

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Thu May 10 18:27:13 EDT 2012


On 05/10/2012 08:23 PM, Chris Barker wrote:
> On Thu, May 10, 2012 at 2:38 AM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no>  wrote:
>> What would serve me? I use NumPy as a glorified "double*".
>
>> all I want is my glorified
>> "double*". I'm probably not a representative user.)
>
> Actually, I think you are representative of a LOT of users -- it
> turns, out, whether Jim Huginin originally was thinking this way or
> not, but numpy arrays are really powerful because the provide BOTH and
> nifty, full featured array object in Python, AND a wrapper around a
> generic "double*" (actually char*, that could be any type).
>
> This is are really widely used feature, and has become even more so
> with Cython's numpy support.
>
> That is one of my concerns about the "bit pattern" idea -- we've then
> created a new binary type that no other standard software understands
> -- that looks like a a lot of work to me to deal with, or even worse,
> ripe for weird, non-obvious errors in code that access that good-old
> char*.
>
> So I'm happier with a mask implementation -- more memory, yes, but it
> seems more robust an easy to deal with with outside code.

It's very interesting that you consider masks easier to integrate with 
C/C++ code than bitpatterns. I guess everybody's experience (and every 
C/C++/Fortran code base) is different.

>
> But either way, Dag's key point is right on -- in Cython (or any other
> code) -- we need to make sure ti's easy to get a regular old pointer
> to a regular old C array, and get something else by accident.

I'm sorry if I caused any confusion -- I didn't mean to suggest that 
anybody would ever remove the ability of getting a pointer to an 
unmasked array.

There is a problem that's being discussed of the opposite nature:

With masked arrays, the current situation in NumPy trunk is that if 
you're presented with a masked array, and do not explicitly check for a 
mask (i.e., all existing code), you'll transparently and without warning 
"unmask" it -- that is, an element has the last value before NA was 
assigned. This is the case whether you use PEP 3118 (np.ndarray[double] 
or double[:]), or PyArray_DATA.

According to the NEP, you should really get an exception when accessing 
through PEP 3118, but this seems to not be implemented. I don't know 
whether this was a conscious change or a lack of implementation (?).

PyArray_DATA will continue to transparently unmask data. However, with 
Travis' proposal of making a new 'ndmasked' type, old code will be 
protected; it will raise an exception for masked arrays instead of 
transparently unmasking, giving the user a chance to work around it (or 
update the code to work with masks).

Regarding new code that you write to be mask-aware, fear not -- you can 
use PyArray_DATA and PyArray_MASKNA_DATA to get the pointers. You can't 
really access the mask using np.ndarray[uint8] or uint8[:], but it 
wouldn't be a problem for NumPy to provide such access for Cython users.

Regarding native Cython support for masks, bitpatterns would be a quick 
job and an uncontroversial feature, we just need to agree on an 
extension to the PEP 3118 format string with NumPy and then it takes a 
few hours to implement it. Masks would require quite some hashing out on 
the Cython email list to figure out whether and how we would want to 
support it, and is quite some more development work as well. How we'd 
even do that is much more vague to me.

Dag



More information about the NumPy-Discussion mailing list