![](https://secure.gravatar.com/avatar/723b49f8d57b46f753cc4097459cbcdb.jpg?s=120&d=mm&r=g)
On 07/06/2011 08:10 PM, Nathaniel Smith wrote:
On Wed, Jul 6, 2011 at 6:12 AM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> wrote:
What I'm saying is that Mark's proposal is more flexible. Say for the sake of the argument that I have two codes I need to interface with:
- Library A is written in Fortran and uses a seperate (explicit) mask array for NA
- Library B runs on a GPU and uses a bit pattern for NA
Have you ever encountered any such codes? I'm not aware of any code outside of R that implements the proposed NA semantics -- esp. in high-performance code, people generally want to avoid lots of conditionals, and the proposed NA semantics require a branch around every operation inside your inner loops.
I'll admit that this whole thing was an hypothetical exercise. I've interfaced with Fortran code with NA values -- not a high performance case, but not all you interface with is high performance.
Certainly there is code out there that uses NaNs, and code that uses masks (in various ways that might or might not match the way the NEP uses them). And it's easy to work with both from numpy right now. The question is whether and how the core should add some tricky and subtle semantics for a few very specific ways of handling NaN-like objects and masking.
I don't disagree with this.
It's exactly this transparency that worries Matthew and me -- we feel that the alterNEP preserves it, and the NEP attempts to erase it. In the NEP, there are two totally different underlying data structures, but this difference is blurred at the Python level. The idea is that you shouldn't have to think about which you have, but if you work with C/Fortran, then of course you do have to be constantly aware of the underlying implementation anyway. And operations which would obviously make sense for the some of the objects that you know you're working with (e.g., unmasking elements from a masked array, or even accessing the mask directly using numpy slicing) are disallowed, specifically in order to make this distinction harder to make.
This worries me too. What I was thinking is that it could be sort of like indexing -- it works OK to have indexing be transparent in Python-land with respect to striding, and have a contiguous array be just a special case marked by an attribute. If you want, you can still check the strides or flags attributes.
According to the NEP, C code that takes a masked array should never ever unmask any element; unmasking should only be done by making a full copy of the mask, and attaching it to a new view taken from the original array. Would you honestly feel obliged to follow this requirement in your C code? Or would you just unmask elements in place when it made sense, in order to save memory?
I'm with you on this one: I wouldn't adopt any NumPy feature widely unless I had totally transparent access to the underlying implementation details from C -- without relying on any NumPy headers (except in my Cython wrappers)! I don't believe in APIs, I believe in standardized binary data. But I always assumed that could be done down the road, once the internal details had stabilized. As for myself, I'll admit that I'll almost certainly continue with explicit masking without using any of the proposed NEPs -- I have to be extremely aware of the masks in the statistical methods I use. Perhaps that's a sign I should withdraw from the discussion. Dag Sverre