[Numpy-discussion] missing data discussion round 2

Wed Jun 29 02:53:43 EDT 2011

On 06/28/2011 11:52 PM, Matthew Brett wrote:
> Hi,
>
> On Tue, Jun 28, 2011 at 5:38 PM, Charles R Harris
> <charlesr.harris at gmail.com>  wrote:
>> Nathaniel, an implementation using masks will look *exactly* like an
>> implementation using na-dtypes from the user's point of view. Except that
>> taking a masked view of an unmasked array allows ignoring values without
>> destroying or copying the original data. The only downside I can see to an
>> implementation using masks is memory and disk storage, and perhaps memory
>> mapped arrays. And I rather expect the former to solve itself in a few
>> years, eight gigs is becoming a baseline for workstations and in a couple of
>> years I expect that to be up around 16-32, and a few years after that.... In
>> any case we are talking 12% - 25% overhead, and in practice I expect it
>> won't be quite as big a problem as folks project.
>
> Or, in the case of 16 bit integers, 50% memory overhead.
>
> I honestly find it hard to believe that I will not care about memory
> use in the near future, and I don't think it's wise to make decisions
> on that assumption.

In many sciences, waiting for the future makes things worse, not better, 
simply because the amount of available data easily grows at a faster 
rate than the amount of memory you can get per dollar :-)

Dag Sverre