[Numpy-discussion] behavior of masked arrays

Fri Mar 7 10:37:04 EST 2008

On Friday 07 March 2008 09:56:59 Giorgio F. Gilestro wrote:
> Hi Everybody,

> My understanding is that only a few functions will be able to properly
> use MA during execution. Is this correct or am I missing something here?

Giogio,
You're right: there's no full support of masked arrays in Scipy yet. I ported 
some functions I needed for my own research, you'll find them in 
numpy.ma.mstast and numpy.ma.morestats, but many, many more are missing.

In your particular example, masked arrays are simply/silently converted to 
regular ndarray with the internal use of numpy.asarray in _chk2_asarray. 
Therefore, you're losing your mask...

You have several options:
1. Rewrite the function(s) you need to make sure masked arrays are properly 
handled. In your case, that'd mean rewriting _chk2_asarray to use 
numpy.asanyarray instead of numpy.asarray, and using the numpy.ma functions 
instead of their numpy.counterparts (that last step might not be necessary, 
but we need to check that). 

2. Don't use masked arrays, but compressed arrays, that is, arrays where the 
missing values have been discarded with a.compressed(). That way, you have 
ndarrays that are processed properly. 
In your case, that'd imply to define a common mask for your samples, select 
the rows/columns depending on you axis, and apply ttest_ind on each 
compressed row/column.

Of course, the #1 solution sounds like the best for the community.

On a side note:
* That particular function (ttest_ind) uses mean and var as functions: I'm 
sure it'd be better to use the corresponding methods, that way masked arrays 
could be taken into account more easily.