[Numpy-discussion] Masked Arrays in NumPy 1.x

Nathaniel Smith njs at pobox.com
Mon Apr 23 16:40:04 EDT 2012


Hi Paul,

On Wed, Apr 11, 2012 at 8:57 PM, Paul Hobson <pmhobson at gmail.com> wrote:
> Travis et al,
>
> This isn't a reply to anything specific in your email and I apologize
> if there is a better thread or place to share this information. I've
> been meaning to participate in the discussion for a long time and
> never got around to it. The main thing I'd like to is convey my
> typical use of the numpy.ma module as an environmental engineer
> analyzing censored datasets --contaminant concentrations that are
> either at well understood values (not masked) or some unknown value
> below an upper bound (masked).
>
> My basic understanding is that this discussion revolved around how to
> treat masked data (ignored vs missing) and how to implement one, both,
> or some middle ground between those two concepts. If I'm off-base,
> just ignore all of the following.
>
> For my purposes, numpy.ma is implemented in a way very well suited to
> my needs. Here's a gist of a something that was *really* hard for me
> before I discovered numpy.ma and numpy in general. (this is a bit
> much, see below for the highlights)
> https://gist.github.com/2361814
>
> The main message here is that I include the upper bounds of the
> unknown values (detection limits) in my array and use that to
> statistically estimate their values. I must be able to retrieve the
> masked detection limits throughout this process. Additionally the
> masks as currently implemented allow me sort first the undetected
> values, then the detected values (see __rosRanks in the gist).
>
> As boots-on-the-ground user of numpy, I'm ecstatic that this tool
> exists. I'm also pretty flexible and don't anticipated any major snags
> in my work if things change dramatically as the masked/missing/ignored
> functionality evolves.
>
> Thanks to everyone for the hard work and great tools,
> -Paul Hobson

Thanks for this note -- it's getting feedback from people on how
they're actually using numpy.ma is *very* helpful, because there's a
lot more data out there on the "missing data" use case.

But, I couldn't quite figure out what you're actually doing in this
code. It looks like the measurements that you're masking out have some
values "hidden behind" the mask, which you then make use of?
Unfortunately, I don't know anything about environmental engineering
or the method of Hirsch and Stedinger (1987). Could you elaborate a
bit on what these masked values mean and how you process them?

-- Nathaniel



More information about the NumPy-Discussion mailing list