On Thu, Oct 27, 2011 at 10:56 PM, Benjamin Root <
ben.root@ou.edu> wrote:
>
>
> On Thursday, October 27, 2011, Charles R Harris <
charlesr.harris@gmail.com>
> wrote:
>>
>>
>> On Thu, Oct 27, 2011 at 7:16 PM, Travis Oliphant <
oliphant@enthought.com>
>> wrote:
>>>
>>> That is a pretty good explanation. I find myself convinced by Matthew's
>>> arguments. I think that being able to separate ABSENT from IGNORED is a
>>> good idea. I also like being able to control SKIP and PROPAGATE (but I
>>> think the current implementation allows this already).
>>>
>>> What is the counter-argument to this proposal?
>>>
>>
>> What exactly do you find convincing? The current masks propagate by
>> default:
>>
>> In [1]: a = ones(5, maskna=1)
>>
>> In [2]: a[2] = NA
>>
>> In [3]: a
>> Out[3]: array([ 1., 1., NA, 1., 1.])
>>
>> In [4]: a + 1
>> Out[4]: array([ 2., 2., NA, 2., 2.])
>>
>> In [5]: a[2] = 10
>>
>> In [5]: a
>> Out[5]: array([ 1., 1., 10., 1., 1.], maskna=True)
>>
>>
>> I don't see an essential difference between the implementation using masks
>> and one using bit patterns, the mask when attached to the original array
>> just adds a bit pattern by extending all the types by one byte, an approach
>> that easily extends to all existing and future types, which is why Mark went
>> that way for the first implementation given the time available. The masks
>> are hidden because folks wanted something that behaved more like R and also
>> because of the desire to combine the missing, ignore, and later possibly bit
>> patterns in a unified manner. Note that the pseudo assignment was also meant
>> to look like R. Adding true bit patterns to numpy isn't trivial and I
>> believe Mark was thinking of parametrized types for that.
>>
>> The main problems I see with masks are unified storage and possibly memory
>> use. The rest is just behavor and desired API and that can be adjusted
>> within the current implementation. There is nothing essentially masky about
>> masks.
>>
>> Chuck
>>
>>
>
> I think chuck sums it up quite nicely. The implementation detail about
> using mask versus bit patterns can still be discussed and addressed.
> Personally, I just don't see how parameterized dtypes would be easier to use
> than the pseudo assignment.
>
> The elegance of mark's solution was to consider the treatment of missing
> data in a unified manner. This puts missing data in a more prominent spot
> for extension builders, which should greatly improve support throughout the
> ecosystem.