
"TimC" == gestalt-system-discuss-admin <gestalt-system-discuss-admin@lists.sourceforge.net> writes:
TimC> Date: Sun, 09 Apr 2000 01:07:13 +1000 TimC> From: Tim Churches <tchur@bigpond.com> TimC> Organization: Gestalt Institute TimC> To: strang@nmr.mgh.harvard.edu, strang@bucky.nmr.mgh.harvard.edu, TimC> gestalt-system-discuss@lists.sourceforge.net, TimC> numpy-discussion@lists.sourceforge.net TimC> I'm a new user of MumPy so forgive me if this is a FAQ. ...... TimC> I've been experimenting with using Gary Strangman's excellent stats.py TimC> functions. The spped of these functions when operating on NumPy arrays TimC> and the ability of NumPy to swallow very large arrays is remarkable. TimC> However, one deficiency I have noticed is the lack of the ability TimC> to represent nulls (i.e. missing values, None or NaN TimC> [Not-a-Number] in NumPy arrays. Missing values commonly occur in TimC> real-life statistical data and although they are usually excluded TimC> from most statistical calculations, it is important to be able to TimC> keep track of the number of missing data elements and report TimC> this. I'm just a recent "listener" on gestalt-system-discuss, and don't even have any python experience. I'm member of the R core team (www.r-project.org). In R (and even in S-plus, but almost invisibly there), we even do differentiate between "NA" (missing / not available) and "NaN" (IEEE result of 0/0, etc). I'd very much like to have these different as in R. I think our implementation of these is quite efficient, implementing NA as one particular bit pattern from the whole possible NaN set. We use code like the following (R source, src/main/arithmetic.c ) : static double R_ValueOfNA(void) { ieee_double x; x.word[hw] = 0x7ff00000; x.word[lw] = 1954; return x.value; } int R_IsNA(double x) { if (isnan(x)) { ieee_double y; y.value = x; return (y.word[lw] == 1954); } return 0; } Martin Maechler <maechler@stat.math.ethz.ch> http://stat.ethz.ch/~maechler/ TimC> Because NumPy arrays can't represent missing data via a TimC> special value, it is necessary to exclude missing data elements TimC> from NumPy arrays and keep track of them elsewhere (in standard TimC> Python lists). This is messy. Also, it is quite common to use TimC> various imputation techniques to estimate the values of missing TimC> data elements - the ability to represent missing data in a NumPy TimC> array and then change it to an imputed value would be a real TimC> boon.