[Numpy-discussion] Re: ndarray.fill and ma.array.filled

Bruce Southey bsouthey at gmail.com
Tue Apr 11 12:47:02 EDT 2006


Hi,
My view is solely as user so I really do appreciate the thought that
you all are putting into this!

I am somewhat concerned that having to use filled() is an extra level
of complexity and computational burden. For example, in computing the
mean/average I using filled would require a one effort to get the sum
and another to count the non-masked elements.

For at least summation would it make more sense to add an optional
flag(s) such that there appears little difference between a normal
array and a masked array?

For example,
a.sum() is the current default
a.sum(filled_value=x) where x is some value such as zero or other user
defined value.
a.sum(ignore_mask=True) or similar to address whether or not masked
values should be used.

I am also not clear on what happens with other operations or dimensions.

Regards
Bruce

On 4/10/06, Pierre GM <pierregm at engr.uga.edu> wrote:
> > [Sasha]
> > > So ? The result is not `masked`, the missing value has been omitted.
> > I am just making your point with a shorter example.
>
> OK, now I get it :)
>
>
> > >Er, why would I want to get MA.masked along one axis if one value is
> > > masked  ?
> >
> > [Tim]
> > Any number of reasons I would think.
>
> I understand that, and I eventually agree it should be the default.
>
> > [Sasha]
> > Because if you don't know one of the addends you don't know the sum.
> Unless you want to discard some data on purpose.
>
> > Replacing missing values with zeros is not always the right strategy.
> > If you know that your data has non-zero mean, for example, you might
> > want to replace missing values with the mean instead of zero.
> Hence the need to get rid of filled_values
>
> >[Tim]
> > Actually I'm going to ask you the same question. Why would care if all
> > of the values are masked?
>
> > > MA.array([[1,1],[1,1]],mask=[[0,1],[1,1]]).sum()
> > > array(data = [1 999999],   mask = [False True], fill_value=999999)
> >
> > [Sasha]
> > I did not realize that, but it is really bad. What is the
> > justification for this?
>
> Masked values are not necessarily nans or missing. I quite regularly mask
> values that do not satisfy a given condition. For various reasons, I can't
> compress the array, I need to preserve its shape.
>
> With the current behavior, a.sum() gives me the sum of the values that satisfy
> the condition. If there's no such value, the result is masked, and that way I
> know that the condition was never met. Here, I could use Sasha's method
> combined with a._mask.all, no problem
>
> Another example: let x a 2D array with missing values, to be normalized along
> one axis. Currently, x/x.sum() give the result I want (provided it's true
> division). Sasha's method would give me a completely masked array.
>
>
> > > Good points... We'll just have to put strong warnings everywhere.
> > [Sasha]
> > Do you agree with my proposal as long as we have explicit warnings in
> > the documentation that methods behave differently from legacy
> > functions?
>
> Your points are quite valid. I'm just worried it's gonna break a lot of things
> in the next future. And where do we stop ? So, if we follow Sasha's way:
> x.prod() should be the same, right ? What about a.min(), a.max() ? a.mean() ?
>
>
>
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by xPML, a groundbreaking scripting language
> that extends applications into web and mobile media. Attend the live webcast
> and join the prime developer group breaking into this new coding territory!
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>




More information about the NumPy-Discussion mailing list