[Numpy-discussion] Missing/accumulating data

Tue Jun 28 13:34:38 EDT 2011

On Tue, Jun 28, 2011 at 10:50 AM, Joe Harrington <jh at physics.ucf.edu> wrote:

> As with Travis, I have not had time to wade through the 150+ messages
> on masked arrays, but I'd like to raise a concept I've mentioned in
> the past that would enable a broader use if done slightly differently.
> That is, the "masked array" problem is a subset of this more-general
> problem.  Please respond on this thread if you want me to see the
> replies.
>
> The problem is of data accumulation and averaging, and it comes up all
> the time in astronomy, medical imaging, and other fields that handle
> stacks of images.  Say that I observe a planet, taking images at, say,
> 30 Hz for several hours, as with many video cameras.  I want to
> register or map those images to a common coordinate system, stack
> them, and average them.  I must recognize:
>
> 1. transient there are bad pixels due to cosmic ray hits on the CCD array
> 2. some pixels are permanently dead and always produce bad data
> 3. some data in the image is not part of the planet
> 4. some images have too much atmospheric blurring to be used
> 5. sometimes moons of planets like Jupiter pass in front
> 6. etc.
>
> For each original image:
>  Make a mask where 1 is good and 0 is bad.  This trims both bad
>      pixels and pixels that are not part of the planet.
>  Evaluate the images and zero the entire image (or regions of an
>      image) if quality is low (or there is an obstruction).
>  Shift (or map, in the case of lon-lat mapping) the image AND THE
>      MASK to a common coordinate grid.
>  Append the image and the mask to the ends of the stacks of prior
>      images and masks.
>
> For each x and y position in the image/mask stacks:
>  Perform some averaging of all pixels in the stack that are not masked,
>      or otherwise use the mask info to decide how to use the image info.
>
> The most common of these averages is to multiply the stack of images
> by the stack of masks, zeroing the bad data, then sum both images and
> masks down the third axis.  This produces an accumulator image and a
> count image, where the count image says how many data points went into
> the accumulator at each pixel position.  Divide the accumulator by the
> count.  This produces an image with no missing data and the highest
> SNR per pixel possible.
>
> Two things differ from the current ma package: the sense of TRUE/FALSE
> and the ability to have numbers other than 0 or 1.  For example, if
> data quality were part of the assessment, each pixel could get a data
> quality number and the accumulation would only include pixels with
> better than a certain number, or there could be weighted averaging.
>
> Currently, we just implement this as described, without using the
> masked array capability at all.  However, it's cumbersome to hand
> around pairs of objects, and making a new object to handle them
> together means breaking it apart when calling standard routines that
> don't handle masks.  There are probably some very fast ways to do
> interesting kinds of averages that we haven't explored but would if
> this kind of capability were in the infrastructure.
>
> So, I hope that if some kind of masking goes into the array object, it
> allows TRUE=good, FALSE=bad, and at least an integer count if not a
> floating quality number.  One bit is not enough.  Of course, these
> could be options.  Certainly adding 7 or 31 more bits to the mask will
> make space problems for some, as would flipping the sense of the
> mask.  However, speed matters.  Some of those doing this kind of
> processing are handling hours of staring video data, which is 1e6
> frames a night from each camera.
>
>
This might be described as masks as weights, or perhaps from the Bayesian
point of view, as measures of just how unknown unknown data is. One thing I
like about masks is that they allow ideas like this to creep in and thus
encourage future innovations. But as you also point out, 8 bits might be a
bit on the low side for that and a consistent idea of how to interpret
things would help ala the missing or unknown models.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110628/381c0028/attachment.html>