[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Mark Wiebe mwwiebe at gmail.com
Sat Jun 25 16:25:10 EDT 2011


On Sat, Jun 25, 2011 at 9:21 AM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

> On Sat, Jun 25, 2011 at 5:29 AM, Pierre GM <pgmdevlist at gmail.com> wrote:
>
>> This thread is getting quite long, innit ?
>> And I think it's getting a tad confusing, because we're mixing two
>> different concepts: missing values and masks.
>> There should be support for missing values in numpy.core, I think we all
>> agree on that.
>> * What's been suggested of adding new dtypes (nafloat, naint) is great, by
>> why not making it the default, then ?
>>
> * Operations involving a NA (whatever the NA actually is, depending on the
>> dtype of the input) should result in a NA (whatever the NA defined by the
>> outputs dtype). That could be done by overloading the existing ufuncs to
>> support the new dtypes.
>> * There should be some simple methods to retrieve the location of those
>> NAs in an array. Whether we just output the indices or a full boolean array
>> (w/ True for a NA, False for a non-NA or vice-versa) needs to be decided.
>> * We can always re-implement masked arrays to use these NAs in a way which
>> would be consistent with numpy.ma (so as not to confuse existing users of
>> numpy.ma): a mask would be a boolean array with the same shape than the
>> underlying ndarray, with True for NA.
>> Mark, I'd suggest you modify your proposal, making it clearer that it's
>> not to add all of numpy.ma functionalities in the core, but just support
>> these missing values. Using the term 'mask' should be avoided as much as
>> possible, use a 'missing data' or whatever.
>>
>
> I think he aims to support both. One complication with masks is keeping
> them tied to the data on disk. With na values one file can contain both the
> data and the missing data markers, whereas with masks, two files would be
> required. I don't think that will fly in the long run unless there is some
> standard file format, like geotiff for GIS, that combines both.
>

Before I was leaning mostly towards masks, but now that I've come up with an
NA bit pattern approach that feels reasonable, I think implementing both
together is on the table.

Bringing up the file format issue is good, that hasn't been covered in the
NEP yet.

-Mark


>
> Chuck
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110625/b1b22f04/attachment.html>


More information about the NumPy-Discussion mailing list