[Numpy-discussion] Warnings in numpy.ma.test()

Christopher Barker Chris.Barker at noaa.gov
Thu Mar 18 15:46:21 EDT 2010

Gael Varoquaux wrote:
> On Thu, Mar 18, 2010 at 12:12:10PM -0700, Christopher Barker wrote:
>> sure -- that's kind of my point -- if EVERY numpy array were 
>> (potentially) masked, then folks would write code to deal with them 
>> appropriately.
> That's pretty much saying: "I have a complicated problem and I want every
> one else to have to deal with the full complexity of it, even if they
> have a simple problem".

Well -- I did say it was a fantasy...

But I disagree -- having invalid data is a very common case. What we 
have now is a situation where we have two parallel systems, masked 
arrays and regular arrays. Each time someone does something new with 
masked arrays, they often find another missing feature, and have to 
solve that. Also, the fact that masked arrays are tacked on means that 
performance suffers.

Maybe it would simply be too ugly, but If I were to start from the 
ground up with a scientific computing package, I would want to put in 
support for missing values from that start.

There are some cases where is it simply too complicated or to expensive 
to handle missing values -- fine, then an exception is raised.

You may be right about how complicated it would be, and what would 
happen is that everyone would simply put a:

if a.masked:
    raise ("I can't deal with masked dat")

stanza at the top of every new method they wrote, but I suspect that if 
the core infrastructure was in place, it would be used.

I'm facing this at the moment: not a big deal, but I'm using histogram2d 
on some data. I just realized that it may have some NaNs in it, and I 
have no idea how those are being handled. I also want to move to masked 
arrays and have no idea if histogram2d can deal with those. At the 
least, I need to do some testing, and I suspect I'll need to do some 
hacking on histogram2d (or just write my own).

I'm sure I'm not the only one in the world that needs to histogram some 
data that may have invalid values -- so wouldn't it be nice if that were 
already handled in a defined way?


Christopher Barker, Ph.D.

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov

More information about the NumPy-Discussion mailing list