[Numpy-discussion] Problems with Masked Arrays and NaN values

Mon Nov 3 09:25:28 EST 2003

Hi all,

I have a problem, and am looking for help...

I am trying to use Python as a glue language for passing some
very large numeric arrays in and out of various C libraries.
These arrays can contain NaN values to indicate missing elements.

As long as on the Python level I only use Numeric to pass these
arrays around as opaque data containers, there is no problem:

- From C library FOO I obtain the huge array 'data';

- Using the PyArray_FromDimsAndData() constructor from the
  Numeric C API, I create a Numeric array that references 'data';

- In Python, I can pass the Numeric array on to e.g. VTKPython
  for visualisation. VTK has no problem with the NaNs --
  everything works.

The problem arises because I want to allow people to manipulate
these arrays from within Python as well. As is mentioned in its
documentation, Numeric does not support NaNs, and instead advises
to use Masked Arrays instead.

These would indeed seem to be well-suited for the job (setting
aside the possible issues of performance and user-friendliness),
but my problem is that I do not understand how I can create an
appropriate mask for my array in the first place.

Something as simple as:

    import MA

    nanv = 1e30000/1e30000
    a = MA.masked_array([0,nanv,nanv,nanv,4], mask=[0,1,1,1,0])
    print MA.filled(2 * MA.sin(a))

works quite well, but explicit enumeration is clearly not an
option for the huge pre-existing arrays I'm dealing with.

So I would want to do something similar to:

    a = MA.masked_object([0,1,nanv,3,4], nanv)

but this simply leads to a.mask() returning None.

At first I thought this was because 'nanv == nanv' always
evaluates to False, but it turns out that in Python 2.3.2 it
actually evaluates to True -- presumably because Python's own
IEEE 754 support is lacking (if I understand PEP 754 correctly).
So why doesn't the masked_object() constructor work? Beats me...
It *does* work if I use e.g. '4' as the value parameter.

I tried many other approaches as well, including downloading the
fpconst package mentioned in PEP 754 and trying to use its
IsNaN() as a condition to the MA.masked_where() constructor --
which doesn't work either, and gives me an exception somewhere
deep within the bowels of MA.

At this point I think I've now reached the end of my rope. Does
anybody reading this have any ideas on how I might beat MA into
submission, or if there are any other solutions I could try that
would allow me to manipulate large NaN-containing arrays
efficiently (or even *at all*!) from within Python? Or am I
perhaps simply (hopefully) missing something obvious?

I am eagerly looking forward to any help or advice. Many thanks
in advance,

-- 
Leo Breebaart  <leo at lspace.org>