[Numpy-discussion] Re: empty_like for masked arrays

Pierre GM pgmdevlist at gmail.com
Mon Jun 10 18:40:30 EDT 2013

On June 10, 2013 at 23:07:24 , Eric Firing (efiring at hawaii.edu) wrote:
On 2013/06/10 10:17 AM, Aldcroft, Thomas wrote:  
> I use np.ma <http://np.ma>, and for me the most intuitive would be the  
> second option where the new array matches the original array in shape  
> and dtype, but always has an empty mask. I always think of the *_like()  
> functions as just copying shape and dtype, so it would be a bit  
> surprising to get part of the data (the mask) from the original. If you  
> do need the mask then on the next line you have an explicit statement to  
> copy the mask and the code and intent will be clear. Also, most of the  
> time the mask is set because that particular data value was bad or  
> missing, so it seems like it would be a less-common use case to want a  
> new empty array with the same mask.  

I also use np.ma (and it is used internally in matplotlib). I agree  
with Tom. I think all of the *_like() functions should start with  
mask=False, meaning nothing is masked by default. I don't see what the  
reasonable use cases would be for any alternative.  

I too agree with Eric and Tom: having the mask set to `np.ma.nomask` by default makes more sense for the `*_like` functions than having a mask matching the input.
AFAICR, these functions were introduced for the sake of completion. I must admit I can't recall why there's an implementation difference between `np.ma.empty` and `np.ma.empty_like`, though… There must have been a corner case at one point or another.
It should be relatively easy to force `_convert2ma` to set the mask of the output to `False`. Note that if that would solve the problem for `empty_like`, `ones_like` and `zero_like` would have to be derived from `_convert2ma` too. Right now, `ones_like` and `zero_like` are just their exact `np` counterparts.
The problem of the mask shared between the input and output comes from the fact that in `_convert2ma`, the mask of the output is just a view of the mask of the input, not a copy.

More information about the NumPy-Discussion mailing list