[Numpy-discussion] behavior of masked arrays

Fri Mar 7 12:25:13 EST 2008

Ok, I see, thank you Pierre.
I thought scipy.stats would have been a widely used extension so I 
didn't really consider the trivial possibility that simply wasn't 
compatible with ma yet.
I had a quick look at the code and it really seems that ma handling can 
be achieved by replacing np.asarray with np.ma.asarray, and some 
functions with their methods (like ravel) here and there.

Yet, I just saw here
http://scipy.org/scipy/scipy/wiki/StatisticsReview
that April and May are going to be StatisticsReview month so I don't 
think it is a good idea to go on and fix things myself now :-) I think I 
will go through here 
http://scipy.org/scipy/scipy/query?status=new&status=assigned&status=reopened&milestone=Statistics+Review+Months&order=priority
and see what I can do.

Thanks

Pierre GM wrote:
> On Friday 07 March 2008 09:56:59 Giorgio F. Gilestro wrote:
>   
>> Hi Everybody,
>>     
>
>   
>> My understanding is that only a few functions will be able to properly
>> use MA during execution. Is this correct or am I missing something here?
>>     
>
> Giogio,
> You're right: there's no full support of masked arrays in Scipy yet. I ported 
> some functions I needed for my own research, you'll find them in 
> numpy.ma.mstast and numpy.ma.morestats, but many, many more are missing.
>
> In your particular example, masked arrays are simply/silently converted to 
> regular ndarray with the internal use of numpy.asarray in _chk2_asarray. 
> Therefore, you're losing your mask...
>
> You have several options:
> 1. Rewrite the function(s) you need to make sure masked arrays are properly 
> handled. In your case, that'd mean rewriting _chk2_asarray to use 
> numpy.asanyarray instead of numpy.asarray, and using the numpy.ma functions 
> instead of their numpy.counterparts (that last step might not be necessary, 
> but we need to check that). 
>
> 2. Don't use masked arrays, but compressed arrays, that is, arrays where the 
> missing values have been discarded with a.compressed(). That way, you have 
> ndarrays that are processed properly. 
> In your case, that'd imply to define a common mask for your samples, select 
> the rows/columns depending on you axis, and apply ttest_ind on each 
> compressed row/column.
>
> Of course, the #1 solution sounds like the best for the community.
>
> On a side note:
> * That particular function (ttest_ind) uses mean and var as functions: I'm 
> sure it'd be better to use the corresponding methods, that way masked arrays 
> could be taken into account more easily.
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>   

-- 
giorgio at gilestro.tk
http://www.cafelamarck.it