[Numpy-discussion] Missing data again

Wed Mar 7 14:39:04 EST 2012

On Wed, Mar 7, 2012 at 1:26 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Wed, Mar 7, 2012 at 5:17 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> > On Wed, Mar 7, 2012 at 9:35 AM, Pierre Haessig <pierre.haessig at crans.org
> >
> >> Coming back to Travis proposition "bit-pattern approaches to missing
> >> data (*at least* for float64 and int32) need to be implemented.", I
> >> wonder what is the amount of extra work to go from nafloat64 to
> >> nafloat32/16 ? Is there an hardware support NaN payloads with these
> >> smaller floats ? If not, or if it is too complicated, I feel it is
> >> acceptable to say "it's too complicated" and fall back to mask. One may
> >> have to choose between fancy types and fancy NAs...
> >
> > I'm in agreement here, and that was a major consideration in making a
> > 'masked' implementation first.
>
> When it comes to "missing data", bitpatterns can do everything that
> masks can do, are no more complicated to implement, and have better
> performance characteristics.
>
>
Not true.  bitpatterns inherently destroys the data, while masks do not.
For matplotlib, we can not use bitpatterns because it could over-write user
data (or we have to copy the data).  I would imagine other extension
writers would have similar issues when they need to play around with input
data in a safe manner.

Also, I doubt that the performance characteristics for strings and integers
are the same as it is for masks.

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120307/487553d6/attachment.html>