the neighbourhood of each element of an array
Hi, Given a (possibly masked) 2d array x, is there a fast(er) way in Numpy to obtain the same result as the following few lines? d = 1 # neighbourhood 'radius' Nrow = x.shape[0] Ncol = x.shape[1] y = array([[x[i-d:i+d+1,j-d:j+d+1].ravel() for j in range(d,Ncol-d)] \ for i in range(d,Nrow-d)]) What you get is an array containing all the elements in a neighbourhood for each element, disregarding the edges to avoid out-of-range problems. The code above becomes quite slow for e.g. a 2000x2000 array. Does anyone know a better approach? Ciao, Joris Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
A Divendres 23 Febrer 2007 17:38, joris@ster.kuleuven.ac.be escrigué:
Hi,
Given a (possibly masked) 2d array x, is there a fast(er) way in Numpy to obtain the same result as the following few lines?
d = 1 # neighbourhood 'radius' Nrow = x.shape[0] Ncol = x.shape[1] y = array([[x[i-d:i+d+1,j-d:j+d+1].ravel() for j in range(d,Ncol-d)] \ for i in range(d,Nrow-d)])
What you get is an array containing all the elements in a neighbourhood for each element, disregarding the edges to avoid out-of-range problems. The code above becomes quite slow for e.g. a 2000x2000 array. Does anyone know a better approach?
Well, it seems that copying data here is taking most of the CPU. Perhaps you may want to try getting *references* to the original slices better. For example, if rd = 2+d, you can write: def get_neighbors_views_ravel(x): # The next is for an array of references to *views* of neighborgs y = numpy.empty((Nrow-2*d, Ncol-2*d), dtype='object') for i in xrange(0, Nrow-2*d): x2 = x[i:i+rd] # Get a view of the first dimension slice for j in xrange(0, Ncol-2*d): y[i, j] = x2[:,j:j+rd].ravel() return y which is a 1.34x (on my machine) faster than your current approach. If you want more speed, you may want to not .ravel() in the new array creation time (you can always use .ravel() when you are going to use the data). The removal of the .ravel() call makes the above function 2.56x faster. Finally, if your machine has an x86 architecture, you can also take advantge of Psyco so as to accelerate a bit more. With Psyco and not raveling, you can get up to 3x better times than your original approach (without using Psyco). Of course, more speed-ups could be possible by using Pyrex or any other easy method for doing C-extensions. I'm attaching a small benchmark, and here are the results for my machine: ref time--> 3.021 views (ravel) time--> 2.258 speed-up--> 1.34 views (no ravel) time--> 1.179 speed-up--> 2.56 and if we use psyco: ref time--> 2.368 views (ravel) time--> 1.636 speed-up--> 1.45 views (no ravel) time--> 0.935 speed-up--> 2.53 Cheers, --
0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"
Thanks for all the useful comments. Some feedback about the improved version of my code snippet. For a 40x40 matrix and d=1 the new version is 44 times faster, and for d=2 it's 27 times faster. For my astronomical images (typical 2000x2000) the new version saves my day. # improved version d = 1 Nrow = x.shape[0] Ncol = x.shape[1] s = 2*d+1 y = empty((s*s,Nrow-2*d,Ncol-2*d), dtype=x.dtype) for i in xrange(-d,d+1): for j in xrange(-d,d+1): y[(j+d)+s*(i+d)]= x[d+i:Nrow-d+i,d+j:Ncol-d+j] y = y.swapaxes(0,2).swapaxes(0,1) The suggestion of Fransesc is a really cool one. But combining it with the suggestion of Bryan does not seem possible in this particular case as the swapaxis operations would no longer be possible as the resulting array would be one with shape (s*s,) containing objects with shape (Nrow-2*d,Ncol-2*d). Cheers, Joris Quoting Francesc Altet <faltet@carabos.com>:
A Divendres 23 Febrer 2007 17:38, joris@ster.kuleuven.ac.be escrigué:
Hi,
Given a (possibly masked) 2d array x, is there a fast(er) way in Numpy to obtain the same result as the following few lines?
d = 1 # neighbourhood 'radius' Nrow = x.shape[0] Ncol = x.shape[1] y = array([[x[i-d:i+d+1,j-d:j+d+1].ravel() for j in range(d,Ncol-d)] \ for i in range(d,Nrow-d)])
What you get is an array containing all the elements in a neighbourhood for each element, disregarding the edges to avoid out-of-range problems. The code above becomes quite slow for e.g. a 2000x2000 array. Does anyone know a better approach?
Well, it seems that copying data here is taking most of the CPU. Perhaps you
may want to try getting *references* to the original slices better. For example, if rd = 2+d, you can write:
def get_neighbors_views_ravel(x): # The next is for an array of references to *views* of neighborgs y = numpy.empty((Nrow-2*d, Ncol-2*d), dtype='object')
for i in xrange(0, Nrow-2*d): x2 = x[i:i+rd] # Get a view of the first dimension slice for j in xrange(0, Ncol-2*d): y[i, j] = x2[:,j:j+rd].ravel() return y
which is a 1.34x (on my machine) faster than your current approach.
If you want more speed, you may want to not .ravel() in the new array creation time (you can always use .ravel() when you are going to use the data). The removal of the .ravel() call makes the above function 2.56x faster.
Finally, if your machine has an x86 architecture, you can also take advantge
of Psyco so as to accelerate a bit more. With Psyco and not raveling, you can
get up to 3x better times than your original approach (without using Psyco).
Of course, more speed-ups could be possible by using Pyrex or any other easy
method for doing C-extensions.
I'm attaching a small benchmark, and here are the results for my machine:
ref time--> 3.021 views (ravel) time--> 2.258 speed-up--> 1.34 views (no ravel) time--> 1.179 speed-up--> 2.56
and if we use psyco:
ref time--> 2.368 views (ravel) time--> 1.636 speed-up--> 1.45 views (no ravel) time--> 0.935 speed-up--> 2.53
Cheers,
--
0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
On Fri, 2007-02-23 at 17:38 +0100, joris@ster.kuleuven.ac.be wrote:
Hi,
Given a (possibly masked) 2d array x, is there a fast(er) way in Numpy to obtain the same result as the following few lines?
d = 1 # neighbourhood 'radius' Nrow = x.shape[0] Ncol = x.shape[1] y = array([[x[i-d:i+d+1,j-d:j+d+1].ravel() for j in range(d,Ncol-d)] \ for i in range(d,Nrow-d)])
how about something like er = Nrow - d ec = Ncol - d y = array([x[i:er+i, j:ec+j] for j in arange(-d,d) for i in arange(-d,d)]) now you're looping over a small array and combining slices of the big array (as opposed to looping over the big array and combining slices from a small one). This should be faster for large Nrow, Ncol. BC
What you get is an array containing all the elements in a neighbourhood for each element, disregarding the edges to avoid out-of-range problems. The code above becomes quite slow for e.g. a 2000x2000 array. Does anyone know a better approach?
Ciao, Joris
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
Scipy's ndimage module has a function that takes a generic callback and calls it with the values of each neighborhood (of a given size, and optionally with a particular "mask" footprint) centered on each array element. That function handles boundary conditions, etc nicely. Unfortunately, I'm not sure if it works with masked arrays, and I think it hands a ravel'd set of pixels back to the callback function. You could probably hack masking in there by passing it the mask concatenated to the array, and then deal with the mask explicitly. Zach On Feb 23, 2007, at 11:39 AM, Bryan Cole wrote:
On Fri, 2007-02-23 at 17:38 +0100, joris@ster.kuleuven.ac.be wrote:
Hi,
Given a (possibly masked) 2d array x, is there a fast(er) way in Numpy to obtain the same result as the following few lines?
d = 1 # neighbourhood 'radius' Nrow = x.shape[0] Ncol = x.shape[1] y = array([[x[i-d:i+d+1,j-d:j+d+1].ravel() for j in range(d,Ncol- d)] \ for i in range(d,Nrow-d)])
how about something like
er = Nrow - d ec = Ncol - d y = array([x[i:er+i, j:ec+j] for j in arange(-d,d) for i in arange(-d,d)])
now you're looping over a small array and combining slices of the big array (as opposed to looping over the big array and combining slices from a small one). This should be faster for large Nrow, Ncol.
BC
What you get is an array containing all the elements in a neighbourhood for each element, disregarding the edges to avoid out-of-range problems. The code above becomes quite slow for e.g. a 2000x2000 array. Does anyone know a better approach?
Ciao, Joris
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
On Friday 23 February 2007 14:53:05 Zachary Pincus wrote:
Scipy's ndimage module has a function that takes a generic callback and calls it with the values of each neighborhood (of a given size, and optionally with a particular "mask" footprint) centered on each array element. That function handles boundary conditions, etc nicely.
Unfortunately, I'm not sure if it works with masked arrays, and I think it hands a ravel'd set of pixels back to the callback function. You could probably hack masking in there by passing it the mask concatenated to the array, and then deal with the mask explicitly.
Without really thinking about it: The easiest would be to process the masked array in steps: * process the _data part of the maskedarray (or its filled version) with the function: that will be your new _data. * if the mask is not nomask, process the _mask part of the maskedarray to get a new _mask * Set to True any element of the new mask that contains a True value: in other terms, mask the values that have a masked neighbor.
participants (5)
-
Bryan Cole
-
Francesc Altet
-
joris@ster.kuleuven.ac.be
-
Pierre GM
-
Zachary Pincus