[Numpy-discussion] Missing data wrap-up and request for comments

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Thu May 10 05:07:41 EDT 2012


On 05/10/2012 06:05 AM, Dag Sverre Seljebotn wrote:
> On 05/10/2012 01:01 AM, Matthew Brett wrote:
>> Hi,
>>
>> On Wed, May 9, 2012 at 12:44 PM, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no>   wrote:
>>> On 05/09/2012 06:46 PM, Travis Oliphant wrote:
>>>> Hey all,
>>>>
>>>> Nathaniel and Mark have worked very hard on a joint document to try and
>>>> explain the current status of the missing-data debate. I think they've
>>>> done an amazing job at providing some context, articulating their views
>>>> and suggesting ways forward in a mutually respectful manner. This is an
>>>> exemplary collaboration and is at the core of why open source is valuable.
>>>>
>>>> The document is available here:
>>>> https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst
>>>>
>>>> After reading that document, it appears to me that there are some
>>>> fundamentally different views on how things should move forward. I'm
>>>> also reading the document incorporating my understanding of the history,
>>>> of NumPy as well as all of the users I've met and interacted with which
>>>> means I have my own perspective that is not necessarily incorporated
>>>> into that document but informs my recommendations. I'm not sure we can
>>>> reach full consensus on this. We are also well past time for moving
>>>> forward with a resolution on this (perhaps we can all agree on that).
>>>>
>>>> I would like one more discussion thread where the technical discussion
>>>> can take place. I will make a plea that we keep this discussion as free
>>>> from logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as
>>>> we can. I can't guarantee that I personally will succeed at that, but I
>>>> can tell you that I will try. That's all I'm asking of anyone else. I
>>>> recognize that there are a lot of other issues at play here besides
>>>> *just* the technical questions, but we are not going to resolve every
>>>> community issue in this technical thread.
>>>>
>>>> We need concrete proposals and so I will start with three. Please feel
>>>> free to comment on these proposals or add your own during the
>>>> discussion. I will stop paying attention to this thread next Wednesday
>>>> (May 16th) (or earlier if the thread dies) and hope that by that time we
>>>> can agree on a way forward. If we don't have agreement, then I will move
>>>> forward with what I think is the right approach. I will either write the
>>>> code myself or convince someone else to write it.
>>>>
>>>> In all cases, we have agreement that bit-pattern dtypes should be added
>>>> to NumPy. We should work on these (int32, float64, complex64, str, bool)
>>>> to start. So, the three proposals are independent of this way forward.
>>>> The proposals are all about the extra mask part:
>>>>
>>>> My three proposals:
>>>>
>>>> * do nothing and leave things as is
>>>>
>>>> * add a global flag that turns off masked array support by default but
>>>> otherwise leaves things unchanged (I'm still unclear how this would work
>>>> exactly)
>>>>
>>>> * move Mark's "masked ndarray objects" into a new fundamental type
>>>> (ndmasked), leaving the actual ndarray type unchanged. The
>>>> array_interface keeps the masked array notions and the ufuncs keep the
>>>> ability to handle arrays like ndmasked. Ideally, numpy.ma
>>>> <http://numpy.ma>   would be changed to use ndmasked objects as their core.
>>>>
>>>> For the record, I'm currently in favor of the third proposal. Feel free
>>>> to comment on these proposals (or provide your own).
>>>>
>>>
>>> Bravo!, NA-overview.rst was an excellent read. Thanks Nathaniel and Mark!
>>
>> Yes, it is very well written, my compliments to the chefs.
>>
>>> The third proposal is certainly the best one from Cython's perspective;
>>> and I imagine for those writing C extensions against the C API too.
>>> Having PyType_Check fail for ndmasked is a very good way of having code
>>> fail that is not written to take masks into account.
>
> I want to make something more clear: There are two Cython cases; in the
> case of "cdef np.ndarray[double]" there is no problem as PEP 3118 access
> will raise an exception for masked arrays.
>
> But, there's the case where you do "cdef np.ndarray", and then proceed
> to use PyArray_DATA. Myself I do this more than PEP 3118 access; usually
> because I pass the data pointer to some C or C++ code.
>
> It'd be great to have such code be forward-compatible in the sense that
> it raises an exception when it meets a masked array. Having PyType_Check
> fail seems like the only way? Am I wrong?

I'm very sorry; I always meant PyObject_TypeCheck, not PyType_Check.

Dag

>
>
>> Mark, Nathaniel - can you comment how your chosen approaches would
>> interact with extension code?
>>
>> I'm guessing the bitpattern dtypes would be expected to cause
>> extension code to choke if the type is not supported?
>
> The proposal, as I understand it, is to use that with new dtypes (?). So
> things will often be fine for that reason:
>
> if arr.dtype == np.float32:
>       c_function_32bit(np.PyArray_DATA(arr), ...)
> else:
>       raise ValueError("need 32-bit float array")
>
>
>>
>> Mark - in :
>>
>> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst#cython
>>
>> - do I understand correctly that you think that Cython and other
>> extension writers should use the numpy API to access the data rather
>> than accessing it directly via the data pointer and strides?
>
> That's not really fleshed out (for all the different usecases etc.); I
> read that as "let's discuss Cython later, when this is actively used in
> NumPy". Which sounds reasonable to me.
>
> Dag
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion




More information about the NumPy-Discussion mailing list