[Numpy-discussion] missing data discussion round 2

Matthew Brett matthew.brett at gmail.com
Tue Jun 28 17:59:51 EDT 2011


Hi,

On Tue, Jun 28, 2011 at 8:41 PM, Eric Firing <efiring at hawaii.edu> wrote:
> On 06/28/2011 07:26 AM, Nathaniel Smith wrote:
>> On Tue, Jun 28, 2011 at 9:38 AM, Charles R Harris
>> <charlesr.harris at gmail.com>  wrote:
>>> Nathaniel, an implementation using masks will look *exactly* like an
>>> implementation using na-dtypes from the user's point of view. Except that
>>> taking a masked view of an unmasked array allows ignoring values without
>>> destroying or copying the original data.
>>
>> Charles, I know that :-).
>>
>> But if that view thing is an advertised feature -- in fact, the key
>> selling point for the masking-based implementation, included
>> specifically to make a significant contingent of users happy -- then
>> it's certainly user-visible. And it will make other users unhappy,
>> like I said. That's life.
>>
>> But who cares? My main point is that implementing a missing data
>> solution and a separate masked array solution is probably less work
>> than implementing a single everything-to-everybody solution *anyway*,
>> *and* it might make both sets of users happier too. Notice that in my
>> proposal, there's really nothing there that isn't already in Mark's
>> NEP in some form or another, but in my version there's almost no
>> overlap between the two features. That's not because I was trying to
>> make them artificially different; it's because I tried to think of the
>> most natural ways to satisfy each set of use cases, and they're just
>> different.
>
> I think you are exaggerating some of the differences associated with the
> implementation, and ignoring one *key* difference: for integer types,
> the masked implementation can handle the full numeric range of the type,
> while the bit-pattern approach cannot.

Losing the most negative value in an int16 doesn't seem too much, but
I agree losing a value in int8 might be annoying.  On the other hand,
maybe it's OK if we don't suport NAs for int8.

See you,

Matthew



More information about the NumPy-Discussion mailing list