
Hi all, I'd like to be able to speed up the following code. def replace_dead(cube, dead): # cube.shape == (320, 640, 1200) # dead.shape == (320, 640) # cube[i,j,:] are bad points to be replaced via interpolation if dead[i,j] == True bands = np.arange(0, cube.shape[0]) for line in range(cube.shape[1]): dead_bands = bands[dead[:, line] == True] good_bands = bands[dead[:, line] == False] for sample in range(cube.shape[2]): # interp returns fp[0] for x < xp[0] and fp[-1] for x > xp[-1] cube[dead_bands, line, sample] = \ np.interp(dead_bands, good_bands, cube[good_bands, line, sample]) Removing that last loop via some sort of broadcasting seems like it should be possible, but that doesn't seem to work with interp. While interp allows the x-coordinates of interpolation points to be a multi-dimensional array, it expects the x- and y-coordinates of the data points to be 1-d arrays. Any suggestions for speeding this up? Thanks, Greg

2010/11/16 greg whittier <gregwh@gmail.com>:
I'd like to be able to speed up the following code.
def replace_dead(cube, dead): # cube.shape == (320, 640, 1200) # dead.shape == (320, 640) # cube[i,j,:] are bad points to be replaced via interpolation if dead[i,j] == True
bands = np.arange(0, cube.shape[0]) for line in range(cube.shape[1]): dead_bands = bands[dead[:, line] == True] good_bands = bands[dead[:, line] == False] for sample in range(cube.shape[2]): # interp returns fp[0] for x < xp[0] and fp[-1] for x > xp[-1] cube[dead_bands, line, sample] = \ np.interp(dead_bands, good_bands, cube[good_bands, line, sample])
I assume you just need *some* interpolation, not that specific one? In that case, I'd suggest the following: 1) Use a 2d interpolation, taking into account all nearest neighbours. 2) For this, use a looped interpolation in this nearest-neighbour sense: a) Generate sums of all unmasked nearest-neighbour values b) Generate counts for the nearest neighbours present c) Replace the bad values by the sums divided by the count. d) Continue at (a) if there are bad values left Bad values which are neighbouring each other (>= 3) need multiple passes through the loop. It should be pretty fast. If this is what you have in mind, maybe we (or I) can make up some code. Friedrich

On Wed, Nov 24, 2010 at 3:16 PM, Friedrich Romstedt < friedrichromstedt@gmail.com> wrote:
2010/11/16 greg whittier <gregwh@gmail.com>:
I'd like to be able to speed up the following code.
def replace_dead(cube, dead): # cube.shape == (320, 640, 1200) # dead.shape == (320, 640) # cube[i,j,:] are bad points to be replaced via interpolation if dead[i,j] == True
bands = np.arange(0, cube.shape[0]) for line in range(cube.shape[1]): dead_bands = bands[dead[:, line] == True] good_bands = bands[dead[:, line] == False] for sample in range(cube.shape[2]): # interp returns fp[0] for x < xp[0] and fp[-1] for x > xp[-1] cube[dead_bands, line, sample] = \ np.interp(dead_bands, good_bands, cube[good_bands, line, sample])
I assume you just need *some* interpolation, not that specific one? In that case, I'd suggest the following:
1) Use a 2d interpolation, taking into account all nearest neighbours. 2) For this, use a looped interpolation in this nearest-neighbour sense: a) Generate sums of all unmasked nearest-neighbour values b) Generate counts for the nearest neighbours present c) Replace the bad values by the sums divided by the count. d) Continue at (a) if there are bad values left
Bad values which are neighbouring each other (>= 3) need multiple passes through the loop. It should be pretty fast.
If this is what you have in mind, maybe we (or I) can make up some code.
Friedrich
Thanks so much for the response! Sorry I didn't respond earlier. I put it aside until I found time to try and understand part 2 of your response and forgot about it. I'm not really looking for 2d interpolation at the moment, but I can see needing it in the future. Right now, I just want to interpolate along one of the three axes. I think what you're suggesting might work for 1d or 2d depending on how you find the nearest neighbors. What routine would you use? Also, when you say "unmasked" do you mean literally using masked arrays? Thanks, Greg

2010/12/1 greg whittier <gregwh@gmail.com>:
On Wed, Nov 24, 2010 at 3:16 PM, Friedrich Romstedt <friedrichromstedt@gmail.com> wrote:
I assume you just need *some* interpolation, not that specific one? In that case, I'd suggest the following:
1) Use a 2d interpolation, taking into account all nearest neighbours. 2) For this, use a looped interpolation in this nearest-neighbour sense: a) Generate sums of all unmasked nearest-neighbour values b) Generate counts for the nearest neighbours present c) Replace the bad values by the sums divided by the count. d) Continue at (a) if there are bad values left
Bad values which are neighbouring each other (>= 3) need multiple passes through the loop. It should be pretty fast.
If this is what you have in mind, maybe we (or I) can make up some code.
Friedrich
Thanks so much for the response! Sorry I didn't respond earlier. I put it aside until I found time to try and understand part 2 of your response and forgot about it. I'm not really looking for 2d interpolation at the moment, but I can see needing it in the future. Right now, I just want to interpolate along one of the three axes. I think what you're suggesting might work for 1d or 2d depending on how you find the nearest neighbors. What routine would you use? Also, when you say "unmasked" do you mean literally using masked arrays?
Hi Greg, if you can estimate that you'll need a more sophisticated algorithm in future I'd recommend to write it in full glory, in a general way, in the end it'll save you time (this is what I would do). Yes, you're right, by choosing just neighbours along one axis you could do simple one-axis interpolation, but in some corner cases it'll not work properly since it will work the following (some ascii graphics): "x" are present values, "-" are missing values. The chain might look like the following: xxxx-xxxx In this case, interpolation will work. It'll pick the two neighbours, and interpolate them. But consider this: xxxx--xxxx This will just propagate the end points to the neighbours. The missing points will have just one neighbour, hence this behaviour. After the propagation, all values are filled, and you end up with a step in the middle. If such neighbouring missing data points are rare, it might still be considerable over Python loops with numpy.interp(). I don't see a way to vectorize interp(), since the run lengthes are different in each case. You might consider writing a C or Cython function, but I cannot give any advise with this. I'm thinking about a way to propagate the values over more than one step. You might know that interpolation (in images) uses also kernels extending beyond the next neighbours. But I don't know precisely how to design them. First, I'd like to know if you have or have not such neighbouring missing data points. And why do you prefer interpolation in only one axis? I can help with the code, but I'd prefer to do it the following way: You write the code, and when you're stuck, seriously, you write back to the list. I'm sure I could do the code, but 1) it might (might?) save me time, 2) You might profit from doing it yourself :-) Would you mind putting the code online in a github repo? Might well be that I sometimes run across a similar problem. Considering your masking question, I would keep the mask array separate, but this is rather because I'm not familiar with masked arrays. Another thing which comes into my mind would be to rewrite or write a new interp() which takes care of masked entries, but it would be quite an amount of work for me (I'm not familiar with the C interior of numpy either). And it would be restricted to one dimension only. If you can please give more detail on you data, where it comes from etc. Friedrich
participants (2)
-
Friedrich Romstedt
-
greg whittier