[Numpy-discussion] Missing/accumulating data
Charles R Harris
charlesr.harris at gmail.com
Tue Jun 28 13:34:38 EDT 2011
On Tue, Jun 28, 2011 at 10:50 AM, Joe Harrington <jh at physics.ucf.edu> wrote:
> As with Travis, I have not had time to wade through the 150+ messages
> on masked arrays, but I'd like to raise a concept I've mentioned in
> the past that would enable a broader use if done slightly differently.
> That is, the "masked array" problem is a subset of this more-general
> problem. Please respond on this thread if you want me to see the
> replies.
>
> The problem is of data accumulation and averaging, and it comes up all
> the time in astronomy, medical imaging, and other fields that handle
> stacks of images. Say that I observe a planet, taking images at, say,
> 30 Hz for several hours, as with many video cameras. I want to
> register or map those images to a common coordinate system, stack
> them, and average them. I must recognize:
>
> 1. transient there are bad pixels due to cosmic ray hits on the CCD array
> 2. some pixels are permanently dead and always produce bad data
> 3. some data in the image is not part of the planet
> 4. some images have too much atmospheric blurring to be used
> 5. sometimes moons of planets like Jupiter pass in front
> 6. etc.
>
> For each original image:
> Make a mask where 1 is good and 0 is bad. This trims both bad
> pixels and pixels that are not part of the planet.
> Evaluate the images and zero the entire image (or regions of an
> image) if quality is low (or there is an obstruction).
> Shift (or map, in the case of lon-lat mapping) the image AND THE
> MASK to a common coordinate grid.
> Append the image and the mask to the ends of the stacks of prior
> images and masks.
>
> For each x and y position in the image/mask stacks:
> Perform some averaging of all pixels in the stack that are not masked,
> or otherwise use the mask info to decide how to use the image info.
>
> The most common of these averages is to multiply the stack of images
> by the stack of masks, zeroing the bad data, then sum both images and
> masks down the third axis. This produces an accumulator image and a
> count image, where the count image says how many data points went into
> the accumulator at each pixel position. Divide the accumulator by the
> count. This produces an image with no missing data and the highest
> SNR per pixel possible.
>
> Two things differ from the current ma package: the sense of TRUE/FALSE
> and the ability to have numbers other than 0 or 1. For example, if
> data quality were part of the assessment, each pixel could get a data
> quality number and the accumulation would only include pixels with
> better than a certain number, or there could be weighted averaging.
>
> Currently, we just implement this as described, without using the
> masked array capability at all. However, it's cumbersome to hand
> around pairs of objects, and making a new object to handle them
> together means breaking it apart when calling standard routines that
> don't handle masks. There are probably some very fast ways to do
> interesting kinds of averages that we haven't explored but would if
> this kind of capability were in the infrastructure.
>
> So, I hope that if some kind of masking goes into the array object, it
> allows TRUE=good, FALSE=bad, and at least an integer count if not a
> floating quality number. One bit is not enough. Of course, these
> could be options. Certainly adding 7 or 31 more bits to the mask will
> make space problems for some, as would flipping the sense of the
> mask. However, speed matters. Some of those doing this kind of
> processing are handling hours of staring video data, which is 1e6
> frames a night from each camera.
>
>
This might be described as masks as weights, or perhaps from the Bayesian
point of view, as measures of just how unknown unknown data is. One thing I
like about masks is that they allow ideas like this to creep in and thus
encourage future innovations. But as you also point out, 8 bits might be a
bit on the low side for that and a consistent idea of how to interpret
things would help ala the missing or unknown models.
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110628/381c0028/attachment.html>
More information about the NumPy-Discussion
mailing list