[Numpy-discussion] Questions about masked arrays

Pierre GM pgmdevlist at gmail.com
Wed Oct 7 01:47:53 EDT 2009


On Oct 7, 2009, at 1:12 AM, Gökhan Sever wrote:
> One more from me:
> I[1]: a = np.arange(5)
> I[2]: mask = 999
> I[6]: a[3] = 999
> I[7]: am = ma.masked_equal(a, mask)
>
> I[8]: am
> O[8]:
> masked_array(data = [0 1 2 -- 4],
>              mask = [False False False  True False],
>        fill_value = 999999)
>
> Where does this fill_value come from? To me it is little confusing  
> having a "value" and "fill_value" in masked array method arguments.

Because the two are unrelated. The `fill_value` is the value used to  
fill the masked elements (that is, the missing entries).
When you create a masked array, you get a `fill_value`, whose actual  
value is defined by default from the dtype of the array: for int, it's  
999999, for float, 1e+20, you get the idea.
The value you used for masking is different, it's just whatver value  
you consider invalid. Now, if I follow you, you would expect the value  
in `masked_equal(array, value)` to be the `fill_value` of the output.  
That's an idea, would you mind fiilling a ticket/enhancement and  
assign it to me? So that I don't forget.


> Probably you can pin-point the error by testing a 1.3.0 version  
> numpy. Not too many arc function with masked array users around I  
> guess :)

Will try, but "if it ain't broken, don't fix it"...

> assert(np.arccos(ma.masked), ma.masked) would be the simplest.

(and in fact, it'd be assert(np.arccos(ma.masked) is ma.masked) in  
this case).


> Good to know this. The more I spend time with numpy the more I  
> understand the importance of testing the code automatically. This  
> said, I still find the test-driven-development approach somewhat  
> bizarre. Start only by writing test code and keep implementing your  
> code until all the tests are satisfied. Very interesting...These  
> software engineers...

Bah, it's not a rule cast in iron... You can start writing your code  
but do write the tests at the same time. It's the best way to make  
sure you're not breaking something later on.
>




More information about the NumPy-Discussion mailing list