[Numpy-discussion] alterNEP - was: missing data discussion round 2

Lluís xscript at gmx.net
Thu Jun 30 16:01:07 EDT 2011


Matthew Brett writes:

> Hi,
> On Thu, Jun 30, 2011 at 7:27 PM, Lluís <xscript at gmx.net> wrote:
>> Matthew Brett writes:
>> [...]
>>> I'm afraid, like you, I'm a little lost in the world of masking,
>>> because I only need the NAs.  I was trying to see if I could come up
>>> with an API that picked up some of the syntactic convenience of NAs,
>>> without conflating NAs with IGNOREs.   I guess we need some feedback
>>> from the 'NA & IGNORE Share the API' (NISA?) proponents to get an idea
>>> of what we've missed.  @Mark, @Chuck, guys - what have we lost here by
>>> separating the APIs?
>> 
>> As I tried to convey on my other mail, separating both will force you to
>> either:
>> 
>> * Make a copy of the array before passing it to another routine (because
>>  the routine will assign np.NA but you still want the original data)

> You have an array 'arr'.   The array does support NAs, but it doesn't
> have a mask.  You want to pass ``arr`` to another routine ``func``.
> You expect ``func`` to set NAs into the data but you don't want
> ``func`` to modify ``arr`` and you don't want to copy ``arr`` either.
> You are saying the following:

> "with the fused API, I can make ``arr`` be a masked array, and pass it
> into ``func``, and know that, when func sets elements of arr to NA, it
> will only modify the mask and not the underlying data in ``arr``."

Yes.


> It does seem to me this is a very obscure case.  First, ``func`` is
> modifying the array but you want an unmodified array back.  Second,
> you'll have to do some view trick to recover the not-NA case to arr,
> when it comes back.

I know, the example is just silly and convoluted.


> It seems to me, that what ``func`` should do, if it wants you to be
> able to unmask the NAs, is to make a masked array view of ``arr``, and
> return that.   And indeed the simplicity of the separated API
> immediately makes that clear - in my view at least.

I agree on this example. My only concern is on the API's ability to
foresee as most future use-cases as possible, without impacting
performance.

1) On one hand, we have that functions must be specially crafted to
   handle transient NA (i.e., create a masked array to store the output,
   which will be possibly optional, so it needs another function
   argument). And not everybody will foresee such usage, resulting in an
   inconsistent API w.r.t. np.NA vs np.IGNORE. We could alternatively
   see this as a knob to say, whenever you store np.NA, please use
   np.IGNORE. It all needs collaboration from the callee.

2) On the other hand, we have that it can all be controlled by the
   caller, who is really the only one that knows its needs. This, at the
   risk of confusing the user (I still believe the user should not be
   confused because the mask must be explicitly activated).

If you're telling me "2 is not necessary because functions written as 1
are few and clearly identified", then I'll just say I don't know.


Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth



More information about the NumPy-Discussion mailing list