[Numpy-discussion] in the NA discussion, what can we agree on?

Benjamin Root ben.root at ou.edu
Wed Nov 2 23:20:15 EDT 2011


On Wednesday, November 2, 2011, Nathaniel Smith <njs at pobox.com> wrote:
> Hi Benjamin,
>
> On Wed, Nov 2, 2011 at 5:25 PM, Benjamin Root <ben.root at ou.edu> wrote:
>> I want to pare this down even more.  I think the above lists makes too
many
>> unneeded extrapolations.
>
> Okay. I found your formatting a little confusing, so I want to make
> sure I understood the changes you're suggesting:
>
> For the description of what MISSING means, you removed the lines:
> - Compatibility with R is valuable
> - To avoid user confusion, ideally it should *not* be possible to
> 'unmask' a missing value, since this is inconsistent with the "missing
> value" metaphor (e.g., see Wes's comment about "leaky abstractions")
>
> And you added the line:
> + Assigning MISSING is destructive
>
> And for the description of what IGNORED means, you removed the lines:
> - Some memory overhead is inevitable and acceptable
> - Compatibility with R neither possible nor valuable
> - Ability to toggle the IGNORED state of a location is critical, and
> should be as convenient as possible
>
> And you added the lines:
> + IGNORE is non-destructive
> + Must be competitive with np.ma for speed and memory (or else users
> would just use np.ma)
>
> Is that right?

Correct.

>
> Assuming it is, my thoughts are:
>
> By R compatibility, I specifically had in mind in-memory
> compatibility. rpy2 provides a more-or-less seamless within-process
> interface between R and Python (and specifically lets you get numpy
> views on arrays returned by R functions), so if we can make this work
> for R arrays containing NA too then that'd be handy. (The rpy2 author
> requested this in the last discussion here:
> http://mail.scipy.org/pipermail/numpy-discussion/2011-June/057084.html)
> When it comes to disk formats, then this doesn't matter so much, since
> IO routines have to translate between different representations all
> the time anyway.
>

Interesting, but I still have to wonder if that should be on the wishlist
for MISSING.  I guess it would matter by knowing whether people would be
fully converting from R or gradually transitioning from it?  That is
something that I can't answer.

> I take the replacement of my line about MISSING disallowing unmasking
> and your line about MISSING assignment being destructive as basically
> expressing the same idea. Is that fair, or did you mean something
> else?

I am someone who wants to get to the absolute core of ideas. Also, this
expression cleanly delineates the differences as binary.

By expressing it this way, we also shy away from implementation details.
For example, Unmasking can be programmatically prevented for MISSING while
it could be implemented by other indirect means for IGNORE. Not that those
are the preferred ways, only that the phrasing is more flexible and
exacting.

>
> Finally, do you think that people who want IGNORED support care about
> having a convenient API for masking/unmasking values? You removed that
> line, but I don't know if that was because you disagreed with it, or
> were just trying to simplify.

See previous.

>
>> Then, as a third-party module developer, I can tell you that having
separate
>> and independent ways to detect "MISSING"/"IGNORED" would likely make
support
>> more difficult and would greatly benefit from a common (or easily
>> combinable) method of identification.
>
> Right, sorry... I didn't forget, and that's part of what I was
> thinking when I described the second approach as keeping them as
> *mostly*-separate interfaces... but I should have made it more
> explicit! Anyway, yes:
>
> 4) There is consensus that whatever approach is taken, there should be
> a quick and convenient way to identify values that are MISSING,
> IGNORED, or both. (E.g., functions is_MISSING, is_IGNORED,
> is_MISSING_or_IGNORED, or some equivalent.)
>

Good.

Cheers!
Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20111102/e4be49be/attachment.html>


More information about the NumPy-Discussion mailing list