[Numpy-discussion] Re: NumPy and None (null, NaN, missing)
Martin Maechler
maechler at stat.math.ethz.ch
Mon Apr 10 04:29:17 EDT 2000
>>>>> "TimC" == gestalt-system-discuss-admin <gestalt-system-discuss-admin at lists.sourceforge.net> writes:
TimC> Date: Sun, 09 Apr 2000 01:07:13 +1000
TimC> From: Tim Churches <tchur at bigpond.com>
TimC> Organization: Gestalt Institute
TimC> To: strang at nmr.mgh.harvard.edu, strang at bucky.nmr.mgh.harvard.edu,
TimC> gestalt-system-discuss at lists.sourceforge.net,
TimC> numpy-discussion at lists.sourceforge.net
TimC> I'm a new user of MumPy so forgive me if this is a FAQ. ......
TimC> I've been experimenting with using Gary Strangman's excellent stats.py
TimC> functions. The spped of these functions when operating on NumPy arrays
TimC> and the ability of NumPy to swallow very large arrays is remarkable.
TimC> However, one deficiency I have noticed is the lack of the ability
TimC> to represent nulls (i.e. missing values, None or NaN
TimC> [Not-a-Number] in NumPy arrays. Missing values commonly occur in
TimC> real-life statistical data and although they are usually excluded
TimC> from most statistical calculations, it is important to be able to
TimC> keep track of the number of missing data elements and report
TimC> this.
I'm just a recent "listener" on gestalt-system-discuss,
and don't even have any python experience.
I'm member of the R core team (www.r-project.org).
In R (and even in S-plus, but almost invisibly there),
we even do differentiate between
"NA" (missing / not available) and "NaN" (IEEE result of 0/0, etc).
I'd very much like to have these different as in R.
I think our implementation of these is quite efficient,
implementing NA as one particular bit pattern from the whole possible NaN
set.
We use code like the following (R source, src/main/arithmetic.c ) :
static double R_ValueOfNA(void)
{
ieee_double x;
x.word[hw] = 0x7ff00000;
x.word[lw] = 1954;
return x.value;
}
int R_IsNA(double x)
{
if (isnan(x)) {
ieee_double y;
y.value = x;
return (y.word[lw] == 1954);
}
return 0;
}
Martin Maechler <maechler at stat.math.ethz.ch> http://stat.ethz.ch/~maechler/
TimC> Because NumPy arrays can't represent missing data via a
TimC> special value, it is necessary to exclude missing data elements
TimC> from NumPy arrays and keep track of them elsewhere (in standard
TimC> Python lists). This is messy. Also, it is quite common to use
TimC> various imputation techniques to estimate the values of missing
TimC> data elements - the ability to represent missing data in a NumPy
TimC> array and then change it to an imputed value would be a real
TimC> boon.
More information about the NumPy-Discussion
mailing list