Hi all,
I'd like to be able to speed up the following code.
def replace_dead(cube, dead): # cube.shape == (320, 640, 1200) # dead.shape == (320, 640) # cube[i,j,:] are bad points to be replaced via interpolation if dead[i,j] == True
bands = np.arange(0, cube.shape[0]) for line in range(cube.shape[1]): dead_bands = bands[dead[:, line] == True] good_bands = bands[dead[:, line] == False] for sample in range(cube.shape[2]): # interp returns fp[0] for x < xp[0] and fp[1] for x > xp[1] cube[dead_bands, line, sample] = \ np.interp(dead_bands, good_bands, cube[good_bands, line, sample])
Removing that last loop via some sort of broadcasting seems like it should be possible, but that doesn't seem to work with interp. While interp allows the xcoordinates of interpolation points to be a multidimensional array, it expects the x and ycoordinates of the data points to be 1d arrays. Any suggestions for speeding this up?
Thanks, Greg
2010/11/16 greg whittier gregwh@gmail.com:
I'd like to be able to speed up the following code.
def replace_dead(cube, dead): # cube.shape == (320, 640, 1200) # dead.shape == (320, 640) # cube[i,j,:] are bad points to be replaced via interpolation if dead[i,j] == True
bands = np.arange(0, cube.shape[0]) for line in range(cube.shape[1]): dead_bands = bands[dead[:, line] == True] good_bands = bands[dead[:, line] == False] for sample in range(cube.shape[2]): # interp returns fp[0] for x < xp[0] and fp[1] for x > xp[1] cube[dead_bands, line, sample] = \ np.interp(dead_bands, good_bands, cube[good_bands, line, sample])
I assume you just need *some* interpolation, not that specific one? In that case, I'd suggest the following:
1) Use a 2d interpolation, taking into account all nearest neighbours. 2) For this, use a looped interpolation in this nearestneighbour sense: a) Generate sums of all unmasked nearestneighbour values b) Generate counts for the nearest neighbours present c) Replace the bad values by the sums divided by the count. d) Continue at (a) if there are bad values left
Bad values which are neighbouring each other (>= 3) need multiple passes through the loop. It should be pretty fast.
If this is what you have in mind, maybe we (or I) can make up some code.
Friedrich
On Wed, Nov 24, 2010 at 3:16 PM, Friedrich Romstedt < friedrichromstedt@gmail.com> wrote:
2010/11/16 greg whittier gregwh@gmail.com:
I'd like to be able to speed up the following code.
def replace_dead(cube, dead): # cube.shape == (320, 640, 1200) # dead.shape == (320, 640) # cube[i,j,:] are bad points to be replaced via interpolation if dead[i,j] == True
bands = np.arange(0, cube.shape[0]) for line in range(cube.shape[1]): dead_bands = bands[dead[:, line] == True] good_bands = bands[dead[:, line] == False] for sample in range(cube.shape[2]): # interp returns fp[0] for x < xp[0] and fp[1] for x > xp[1] cube[dead_bands, line, sample] = \ np.interp(dead_bands, good_bands, cube[good_bands, line, sample])
I assume you just need *some* interpolation, not that specific one? In that case, I'd suggest the following:
 Use a 2d interpolation, taking into account all nearest neighbours.
 For this, use a looped interpolation in this nearestneighbour sense: a) Generate sums of all unmasked nearestneighbour values b) Generate counts for the nearest neighbours present c) Replace the bad values by the sums divided by the count. d) Continue at (a) if there are bad values left
Bad values which are neighbouring each other (>= 3) need multiple passes through the loop. It should be pretty fast.
If this is what you have in mind, maybe we (or I) can make up some code.
Friedrich
Thanks so much for the response! Sorry I didn't respond earlier. I put it aside until I found time to try and understand part 2 of your response and forgot about it. I'm not really looking for 2d interpolation at the moment, but I can see needing it in the future. Right now, I just want to interpolate along one of the three axes. I think what you're suggesting might work for 1d or 2d depending on how you find the nearest neighbors. What routine would you use? Also, when you say "unmasked" do you mean literally using masked arrays?
Thanks, Greg
2010/12/1 greg whittier gregwh@gmail.com:
On Wed, Nov 24, 2010 at 3:16 PM, Friedrich Romstedt friedrichromstedt@gmail.com wrote:
I assume you just need *some* interpolation, not that specific one? In that case, I'd suggest the following:
 Use a 2d interpolation, taking into account all nearest neighbours.
 For this, use a looped interpolation in this nearestneighbour sense:
a) Generate sums of all unmasked nearestneighbour values b) Generate counts for the nearest neighbours present c) Replace the bad values by the sums divided by the count. d) Continue at (a) if there are bad values left
Bad values which are neighbouring each other (>= 3) need multiple passes through the loop. It should be pretty fast.
If this is what you have in mind, maybe we (or I) can make up some code.
Friedrich
Thanks so much for the response! Sorry I didn't respond earlier. I put it aside until I found time to try and understand part 2 of your response and forgot about it. I'm not really looking for 2d interpolation at the moment, but I can see needing it in the future. Right now, I just want to interpolate along one of the three axes. I think what you're suggesting might work for 1d or 2d depending on how you find the nearest neighbors. What routine would you use? Also, when you say "unmasked" do you mean literally using masked arrays?
Hi Greg,
if you can estimate that you'll need a more sophisticated algorithm in future I'd recommend to write it in full glory, in a general way, in the end it'll save you time (this is what I would do).
Yes, you're right, by choosing just neighbours along one axis you could do simple oneaxis interpolation, but in some corner cases it'll not work properly since it will work the following (some ascii graphics):
"x" are present values, "" are missing values. The chain might look like the following:
xxxxxxxx
In this case, interpolation will work. It'll pick the two neighbours, and interpolate them. But consider this:
xxxxxxxx
This will just propagate the end points to the neighbours. The missing points will have just one neighbour, hence this behaviour.
After the propagation, all values are filled, and you end up with a step in the middle.
If such neighbouring missing data points are rare, it might still be considerable over Python loops with numpy.interp(). I don't see a way to vectorize interp(), since the run lengthes are different in each case. You might consider writing a C or Cython function, but I cannot give any advise with this.
I'm thinking about a way to propagate the values over more than one step. You might know that interpolation (in images) uses also kernels extending beyond the next neighbours. But I don't know precisely how to design them.
First, I'd like to know if you have or have not such neighbouring missing data points.
And why do you prefer interpolation in only one axis?
I can help with the code, but I'd prefer to do it the following way: You write the code, and when you're stuck, seriously, you write back to the list. I'm sure I could do the code, but 1) it might (might?) save me time, 2) You might profit from doing it yourself :)
Would you mind putting the code online in a github repo? Might well be that I sometimes run across a similar problem.
Considering your masking question, I would keep the mask array separate, but this is rather because I'm not familiar with masked arrays.
Another thing which comes into my mind would be to rewrite or write a new interp() which takes care of masked entries, but it would be quite an amount of work for me (I'm not familiar with the C interior of numpy either). And it would be restricted to one dimension only.
If you can please give more detail on you data, where it comes from etc.
Friedrich
participants (2)

Friedrich Romstedt

greg whittier