[Numpy-discussion] broadcasting with numpy.interp
friedrichromstedt at gmail.com
Thu Dec 2 07:04:46 EST 2010
2010/12/1 greg whittier <gregwh at gmail.com>:
> On Wed, Nov 24, 2010 at 3:16 PM, Friedrich Romstedt
> <friedrichromstedt at gmail.com> wrote:
>> I assume you just need *some* interpolation, not that specific one?
>> In that case, I'd suggest the following:
>> 1) Use a 2d interpolation, taking into account all nearest neighbours.
>> 2) For this, use a looped interpolation in this nearest-neighbour sense:
>> a) Generate sums of all unmasked nearest-neighbour values
>> b) Generate counts for the nearest neighbours present
>> c) Replace the bad values by the sums divided by the count.
>> d) Continue at (a) if there are bad values left
>> Bad values which are neighbouring each other (>= 3) need multiple
>> passes through the loop. It should be pretty fast.
>> If this is what you have in mind, maybe we (or I) can make up some code.
> Thanks so much for the response! Sorry I didn't respond earlier. I put it
> aside until I found time to try and understand part 2 of your response and
> forgot about it. I'm not really looking for 2d interpolation at the moment,
> but I can see needing it in the future. Right now, I just want to
> interpolate along one of the three axes. I think what you're suggesting
> might work for 1d or 2d depending on how you find the nearest neighbors.
> What routine would you use? Also, when you say "unmasked" do you mean
> literally using masked arrays?
if you can estimate that you'll need a more sophisticated algorithm in
future I'd recommend to write it in full glory, in a general way, in
the end it'll save you time (this is what I would do).
Yes, you're right, by choosing just neighbours along one axis you
could do simple one-axis interpolation, but in some corner cases it'll
not work properly since it will work the following (some ascii
"x" are present values, "-" are missing values. The chain might look
like the following:
In this case, interpolation will work. It'll pick the two neighbours,
and interpolate them. But consider this:
This will just propagate the end points to the neighbours. The
missing points will have just one neighbour, hence this behaviour.
After the propagation, all values are filled, and you end up with a
step in the middle.
If such neighbouring missing data points are rare, it might still be
considerable over Python loops with numpy.interp(). I don't see a way
to vectorize interp(), since the run lengthes are different in each
case. You might consider writing a C or Cython function, but I cannot
give any advise with this.
I'm thinking about a way to propagate the values over more than one
step. You might know that interpolation (in images) uses also kernels
extending beyond the next neighbours. But I don't know precisely how
to design them.
First, I'd like to know if you have or have not such neighbouring
missing data points.
And why do you prefer interpolation in only one axis?
I can help with the code, but I'd prefer to do it the following way:
You write the code, and when you're stuck, seriously, you write back
to the list. I'm sure I could do the code, but 1) it might (might?)
save me time, 2) You might profit from doing it yourself :-)
Would you mind putting the code online in a github repo? Might well
be that I sometimes run across a similar problem.
Considering your masking question, I would keep the mask array
separate, but this is rather because I'm not familiar with masked
Another thing which comes into my mind would be to rewrite or write a
new interp() which takes care of masked entries, but it would be quite
an amount of work for me (I'm not familiar with the C interior of
numpy either). And it would be restricted to one dimension only.
If you can please give more detail on you data, where it comes from etc.
More information about the NumPy-Discussion