[Numpy-discussion] Missing data again

Matthew Brett matthew.brett at gmail.com
Wed Mar 7 14:54:43 EST 2012


Hi,

On Wed, Mar 7, 2012 at 11:37 AM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Wed, Mar 7, 2012 at 12:26 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>
>> On Wed, Mar 7, 2012 at 5:17 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> > On Wed, Mar 7, 2012 at 9:35 AM, Pierre Haessig
>> > <pierre.haessig at crans.org>
>> >> Coming back to Travis proposition "bit-pattern approaches to missing
>> >> data (*at least* for float64 and int32) need to be implemented.", I
>> >> wonder what is the amount of extra work to go from nafloat64 to
>> >> nafloat32/16 ? Is there an hardware support NaN payloads with these
>> >> smaller floats ? If not, or if it is too complicated, I feel it is
>> >> acceptable to say "it's too complicated" and fall back to mask. One may
>> >> have to choose between fancy types and fancy NAs...
>> >
>> > I'm in agreement here, and that was a major consideration in making a
>> > 'masked' implementation first.
>>
>> When it comes to "missing data", bitpatterns can do everything that
>> masks can do, are no more complicated to implement, and have better
>> performance characteristics.
>>
>
> Maybe for float, for other things, no. And we have lots of otherthings. The
> performance is a strawman, and it *isn't* easier to implement.
>
>>
>> > Also, different folks adopt different values
>> > for 'missing' data, and distributing one or several masks along with the
>> > data is another common practice.
>>
>> True, but not really relevant to the current debate, because you have
>> to handle such issues as part of your general data import workflow
>> anyway, and none of these is any more complicated no matter which
>> implementations are available.
>>
>> > One inconvenience I have run into with the current API is that is should
>> > be
>> > easier to clear the mask from an "ignored" value without taking a new
>> > view
>> > or assigning known data. So maybe two types of masks (different
>> > payloads),
>> > or an additional flag could be helpful. The process of assigning masks
>> > could
>> > also be made a bit easier than using fancy indexing.
>>
>> So this, uh... this was actually the whole goal of the "alterNEP"
>> design for masks -- making all this stuff easy for people (like you,
>> apparently?) that want support for ignored values, separately from
>> missing data, and want a nice clean API for it. Basically having a
>> separate .mask attribute which was an ordinary, assignable array
>> broadcastable to the attached array's shape. Nobody seemed interested
>> in talking about it much then but maybe there's interest now?
>>
>
> Come off it, Nathaniel, the problem is minor and fixable. The intent of the
> initial implementation was to discover such things. These things are less
> accessible with the current API *precisely* because of the feedback from R
> users. It didn't start that way.
>
> We now have something to evolve into what we want. That is a heck of a lot
> more useful than endless discussion.

The endless discussion is for the following reason:

- The discussion was never adequately resolved.

The discussion was never adequately resolved because there was not
enough work done to understand the various arguments.   In particular,
you've several times said things that indicate to me, as to Nathaniel,
that you either have not read or have not understood the points that
Nathaniel was making.

Travis' recent email - to me - also indicates that there is still a
genuine problem here that has not been adequately explored.

There is no future in trying to stop discussion, and trying to do so
will only prolong it and make it less useful.  It will make the
discussion - endless.

If you want to help - read the alterNEP, respond to it directly, and
further the discussion by engaged debate.

Best,

Matthew



More information about the NumPy-Discussion mailing list