Efficient numpy slicing for a "sliding window approach".

Dear all, I'm trying to optimize the code below and I was wondering if there is an efficient method that could reduce the numpy slicing overheard without going with cython. Is there anyway I could use mgrid to get a matrix with all my "windows" and then do a large matrix multiply instead? Any idea? Thanks in advance. Best regards, -- Nicolas Pinto Ph.D. Candidate, Brain & Computer Sciences Massachusetts Institute of Technology, USA http://web.mit.edu/pinto ======================================== import numpy as np from numpy import dot arrh, arrw, arrd = 480,640,96 arr = np.random.randn(arrh, arrw, arrd).astype("float32") stride = 16 winh, winw, wind = 128,64,96 limit = 100 clas_w = np.random.randn(8,4,96).astype("float32").ravel() @profile def func(): nh, nw = arrh-winh+1, arrw-winw+1 rngh = np.arange(nh) rngw = np.arange(nw) np.random.shuffle(rngh) np.random.shuffle(rngw) rngh = rngh[:limit] rngw = rngw[:limit] rngh = np.sort(rngh) rngw = np.sort(rngw) resps = np.empty((nh,nw), dtype="float32") for j in rngh: for i in rngw: win = arr[j:j+winh:stride, i:i+winw:stride].ravel() resp = dot(win, clas_w) resps[j,i] = resp resps = resps.ravel() # ... func() ======================================== % python kernprof.py -l -v sliding_win.py Wrote profile results to sliding_win.py.lprof Timer unit: 1e-06 s File: sliding_win.py Function: func at line 14 Total time: 0.224315 s Line # Hits Time Per Hit % Time Line Contents ============================================================== 14 @profile 15 def func(): 16 1 7 7.0 0.0 nh, nw = arrh-winh+1, arrw-winw+1 17 1 14 14.0 0.0 rngh = np.arange(nh) 18 1 4 4.0 0.0 rngw = np.arange(nw) 19 20 1 89 89.0 0.0 np.random.shuffle(rngh) 21 1 124 124.0 0.1 np.random.shuffle(rngw) 22 23 1 3 3.0 0.0 rngh = rngh[:limit] 24 1 2 2.0 0.0 rngw = rngw[:limit] 25 26 1 39 39.0 0.0 rngh = np.sort(rngh) 27 1 19 19.0 0.0 rngw = np.sort(rngw) 28 1 24 24.0 0.0 resps = np.empty((nh,nw), dtype="float32") 29 101 127 1.3 0.1 for j in rngh: 30 10100 12636 1.3 5.6 for i in rngw: 31 10000 118190 11.8 52.7 win = arr[j:j+winh:stride, i:i+winw:stride].ravel() 32 10000 73854 7.4 32.9 resp = dot(win, clas_w) 33 10000 19180 1.9 8.6 resps[j,i] = resp 34 35 1 3 3.0 0.0 resps = resps.ravel()

On Sat, Feb 21, 2009 at 1:46 PM, Nicolas Pinto <pinto@mit.edu> wrote:
Dear all,
I'm trying to optimize the code below and I was wondering if there is an efficient method that could reduce the numpy slicing overheard without going with cython. Is there anyway I could use mgrid to get a matrix with all my "windows" and then do a large matrix multiply instead?
If you only care about removing the two loops for the per-window processing, Anne Archibald and Robert Kern wrote a very useful function, segment_axis, which is like the matlab buffer function on steroids, using numpy stride tricks (to avoid copies in many cases). I think that would do everything you want, right ? I use it a lot in my own code, it may be worth being included in numpy or scipy proper. http://projects.scipy.org/scipy/scikits/browser/trunk/talkbox/scikits/talkbo... David

Thanks a lot for the pointer to segmentaxis. I'm trying to use it "as is" and it seems that I need to a big reshape before the matrix multiplication. Am I missing something ? ======================================== import numpy as np from numpy import dot, transpose arrh, arrw, arrd = 480,640,96 arr = np.random.randn(arrh, arrw, arrd).astype("float32") stride = 16 winh, winw, wind = 128,64,96 limit = 100 clas_w = np.random.randn(8,4,96).astype("float32").ravel() from segmentaxis import segment_axis nh, nw = arrh-winh+1, arrw-winw+1 @profile def func_loop(arr): resps = np.empty((nh,nw), dtype="float32") for j in xrange(nh): for i in xrange(nw): win = arr[j:j+winh:stride, i:i+winw:stride] win = win.ravel() resp = dot(win, clas_w) resps[j,i] = resp resps = resps.ravel() print resps.mean() @profile def func_segment(arr): arr = segment_axis(arr, winh, winh-1, axis=0) arr = arr[:, ::stride] arr = segment_axis(arr, winw, winw-1, axis=2) arr = arr[:, :, :, ::stride] arr = transpose(arr, [0, 2, 1, 3, 4]) arr = arr.reshape(-1, clas_w.size) resps2 = dot(arr, clas_w) resps2 = resps2.ravel() print resps2.mean() # ... func_loop(arr) func_segment(arr) ======================================== % python kernprof.py -l -v sliding_win_all.py w-0.0500485824034 segmentaxis.py:94: UserWarning: Problem with ndarray creation forces copy. warnings.warn("Problem with ndarray creation forces copy.") -0.0500485824034 Wrote profile results to sliding_win_all.py.lprof Timer unit: 1e-06 s File: sliding_win_all.py Function: func_loop at line 18 Total time: 3.9785 s Line # Hits Time Per Hit % Time Line Contents ============================================================== 18 @profile 19 def func_loop(arr): 20 21 1 28 28.0 0.0 resps = np.empty((nh,nw), dtype="float32") 22 354 308 0.9 0.0 for j in xrange(nh): 23 204034 199753 1.0 5.0 for i in xrange(nw): 24 203681 691915 3.4 17.4 win = arr[j:j+winh:stride, i:i+winw:stride] 25 203681 1341174 6.6 33.7 win = win.ravel() 26 203681 1417998 7.0 35.6 resp = dot(win, clas_w) 27 203681 326520 1.6 8.2 resps[j,i] = resp 28 29 1 2 2.0 0.0 resps = resps.ravel() 30 1 805 805.0 0.0 print resps.mean() File: sliding_win_all.py Function: func_segment at line 33 Total time: 3.82026 s Line # Hits Time Per Hit % Time Line Contents ============================================================== 33 @profile 34 def func_segment(arr): 35 36 1 43 43.0 0.0 arr = segment_axis(arr, winh, winh-1, axis=0) 37 1 4 4.0 0.0 arr = arr[:, ::stride] 38 39 1 546505 546505.0 14.3 arr = segment_axis(arr, winw, winw-1, axis=2) 40 1 12 12.0 0.0 arr = arr[:, :, :, ::stride] 41 42 1 12 12.0 0.0 arr = transpose(arr, [0, 2, 1, 3, 4]) 43 1 2435047 2435047.0 63.7 arr = arr.reshape(-1, clas_w.size) 44 45 1 837700 837700.0 21.9 resps2 = dot(arr, clas_w) 46 47 1 41 41.0 0.0 resps2 = resps2.ravel() 48 1 892 892.0 0.0 print resps2.mean() On Fri, Feb 20, 2009 at 11:55 PM, David Cournapeau <cournape@gmail.com>wrote:
On Sat, Feb 21, 2009 at 1:46 PM, Nicolas Pinto <pinto@mit.edu> wrote:
Dear all,
I'm trying to optimize the code below and I was wondering if there is an efficient method that could reduce the numpy slicing overheard without going with cython. Is there anyway I could use mgrid to get a matrix with all my "windows" and then do a large matrix multiply instead?
If you only care about removing the two loops for the per-window processing, Anne Archibald and Robert Kern wrote a very useful function, segment_axis, which is like the matlab buffer function on steroids, using numpy stride tricks (to avoid copies in many cases). I think that would do everything you want, right ? I use it a lot in my own code, it may be worth being included in numpy or scipy proper.
http://projects.scipy.org/scipy/scikits/browser/trunk/talkbox/scikits/talkbo...
David _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Thanks again! -- Nicolas Pinto Ph.D. Candidate, Brain & Computer Sciences Massachusetts Institute of Technology, USA http://web.mit.edu/pinto

On Sat, Feb 21, 2009 at 12:36 AM, Nicolas Pinto <pinto@mit.edu> wrote:
Thanks a lot for the pointer to segmentaxis. I'm trying to use it "as is" and it seems that I need to a big reshape before the matrix multiplication. Am I missing something ?
========================================
import numpy as np from numpy import dot, transpose
arrh, arrw, arrd = 480,640,96 arr = np.random.randn(arrh, arrw, arrd).astype("float32")
stride = 16 winh, winw, wind = 128,64,96
limit = 100
clas_w = np.random.randn(8,4,96).astype("float32").ravel()
from segmentaxis import segment_axis
nh, nw = arrh-winh+1, arrw-winw+1
@profile def func_loop(arr):
resps = np.empty((nh,nw), dtype="float32") for j in xrange(nh): for i in xrange(nw): win = arr[j:j+winh:stride, i:i+winw:stride] win = win.ravel() resp = dot(win, clas_w) resps[j,i] = resp
resps = resps.ravel() print resps.mean()
@profile def func_segment(arr):
arr = segment_axis(arr, winh, winh-1, axis=0) arr = arr[:, ::stride]
arr = segment_axis(arr, winw, winw-1, axis=2) arr = arr[:, :, :, ::stride]
arr = transpose(arr, [0, 2, 1, 3, 4]) arr = arr.reshape(-1, clas_w.size)
resps2 = dot(arr, clas_w)
resps2 = resps2.ravel() print resps2.mean()
# ...
func_loop(arr)
func_segment(arr)
========================================
% python kernprof.py -l -v sliding_win_all.py w-0.0500485824034 segmentaxis.py:94: UserWarning: Problem with ndarray creation forces copy. warnings.warn("Problem with ndarray creation forces copy.") -0.0500485824034 Wrote profile results to sliding_win_all.py.lprof Timer unit: 1e-06 s
File: sliding_win_all.py Function: func_loop at line 18 Total time: 3.9785 s
Line # Hits Time Per Hit % Time Line Contents ============================================================== 18 @profile 19 def func_loop(arr): 20 21 1 28 28.0 0.0 resps = np.empty((nh,nw), dtype="float32") 22 354 308 0.9 0.0 for j in xrange(nh): 23 204034 199753 1.0 5.0 for i in xrange(nw): 24 203681 691915 3.4 17.4 win = arr[j:j+winh:stride, i:i+winw:stride] 25 203681 1341174 6.6 33.7 win = win.ravel() 26 203681 1417998 7.0 35.6 resp = dot(win, clas_w) 27 203681 326520 1.6 8.2 resps[j,i] = resp 28 29 1 2 2.0 0.0 resps = resps.ravel() 30 1 805 805.0 0.0 print resps.mean()
File: sliding_win_all.py Function: func_segment at line 33 Total time: 3.82026 s
Line # Hits Time Per Hit % Time Line Contents ============================================================== 33 @profile 34 def func_segment(arr): 35 36 1 43 43.0 0.0 arr = segment_axis(arr, winh, winh-1, axis=0) 37 1 4 4.0 0.0 arr = arr[:, ::stride] 38 39 1 546505 546505.0 14.3 arr = segment_axis(arr, winw, winw-1, axis=2) 40 1 12 12.0 0.0 arr = arr[:, :, :, ::stride] 41 42 1 12 12.0 0.0 arr = transpose(arr, [0, 2, 1, 3, 4]) 43 1 2435047 2435047.0 63.7 arr = arr.reshape(-1, clas_w.size) 44 45 1 837700 837700.0 21.9 resps2 = dot(arr, clas_w) 46 47 1 41 41.0 0.0 resps2 = resps2.ravel() 48 1 892 892.0 0.0 print resps2.mean()
On Fri, Feb 20, 2009 at 11:55 PM, David Cournapeau <cournape@gmail.com> wrote:
On Sat, Feb 21, 2009 at 1:46 PM, Nicolas Pinto <pinto@mit.edu> wrote:
Dear all,
I'm trying to optimize the code below and I was wondering if there is an efficient method that could reduce the numpy slicing overheard without going with cython. Is there anyway I could use mgrid to get a matrix with all my "windows" and then do a large matrix multiply instead?
If you only care about removing the two loops for the per-window processing, Anne Archibald and Robert Kern wrote a very useful function, segment_axis, which is like the matlab buffer function on steroids, using numpy stride tricks (to avoid copies in many cases). I think that would do everything you want, right ? I use it a lot in my own code, it may be worth being included in numpy or scipy proper.
http://projects.scipy.org/scipy/scikits/browser/trunk/talkbox/scikits/talkbo...
David _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Thanks again!
-- Nicolas Pinto Ph.D. Candidate, Brain & Computer Sciences Massachusetts Institute of Technology, USA http://web.mit.edu/pinto
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Hi, I don't really understand your array dimension, but to me it seems that you could use ndimage.correlate2d for this. Josef

Thanks Josef. I'm not sure how I could use correlate2d because of the 'stride' parameter on the y and x axes, but I may be able to do something on the z axis. On Sat, Feb 21, 2009 at 12:56 AM, <josef.pktd@gmail.com> wrote:
On Sat, Feb 21, 2009 at 12:36 AM, Nicolas Pinto <pinto@mit.edu> wrote:
Thanks a lot for the pointer to segmentaxis. I'm trying to use it "as is" and it seems that I need to a big reshape before the matrix multiplication. Am I missing something ?
========================================
import numpy as np from numpy import dot, transpose
arrh, arrw, arrd = 480,640,96 arr = np.random.randn(arrh, arrw, arrd).astype("float32")
stride = 16 winh, winw, wind = 128,64,96
limit = 100
clas_w = np.random.randn(8,4,96).astype("float32").ravel()
from segmentaxis import segment_axis
nh, nw = arrh-winh+1, arrw-winw+1
@profile def func_loop(arr):
resps = np.empty((nh,nw), dtype="float32") for j in xrange(nh): for i in xrange(nw): win = arr[j:j+winh:stride, i:i+winw:stride] win = win.ravel() resp = dot(win, clas_w) resps[j,i] = resp
resps = resps.ravel() print resps.mean()
@profile def func_segment(arr):
arr = segment_axis(arr, winh, winh-1, axis=0) arr = arr[:, ::stride]
arr = segment_axis(arr, winw, winw-1, axis=2) arr = arr[:, :, :, ::stride]
arr = transpose(arr, [0, 2, 1, 3, 4]) arr = arr.reshape(-1, clas_w.size)
resps2 = dot(arr, clas_w)
resps2 = resps2.ravel() print resps2.mean()
# ...
func_loop(arr)
func_segment(arr)
========================================
% python kernprof.py -l -v sliding_win_all.py w-0.0500485824034 segmentaxis.py:94: UserWarning: Problem with ndarray creation forces copy. warnings.warn("Problem with ndarray creation forces copy.") -0.0500485824034 Wrote profile results to sliding_win_all.py.lprof Timer unit: 1e-06 s
File: sliding_win_all.py Function: func_loop at line 18 Total time: 3.9785 s
Line # Hits Time Per Hit % Time Line Contents ============================================================== 18 @profile 19 def func_loop(arr): 20 21 1 28 28.0 0.0 resps = np.empty((nh,nw), dtype="float32") 22 354 308 0.9 0.0 for j in xrange(nh): 23 204034 199753 1.0 5.0 for i in xrange(nw): 24 203681 691915 3.4 17.4 win = arr[j:j+winh:stride, i:i+winw:stride] 25 203681 1341174 6.6 33.7 win = win.ravel() 26 203681 1417998 7.0 35.6 resp = dot(win, clas_w) 27 203681 326520 1.6 8.2 resps[j,i] = resp 28 29 1 2 2.0 0.0 resps = resps.ravel() 30 1 805 805.0 0.0 print resps.mean()
File: sliding_win_all.py Function: func_segment at line 33 Total time: 3.82026 s
Line # Hits Time Per Hit % Time Line Contents ============================================================== 33 @profile 34 def func_segment(arr): 35 36 1 43 43.0 0.0 arr = segment_axis(arr, winh, winh-1, axis=0) 37 1 4 4.0 0.0 arr = arr[:, ::stride] 38 39 1 546505 546505.0 14.3 arr = segment_axis(arr, winw, winw-1, axis=2) 40 1 12 12.0 0.0 arr = arr[:, :, :, ::stride] 41 42 1 12 12.0 0.0 arr = transpose(arr, [0, 2, 1, 3, 4]) 43 1 2435047 2435047.0 63.7 arr = arr.reshape(-1, clas_w.size) 44 45 1 837700 837700.0 21.9 resps2 = dot(arr, clas_w) 46 47 1 41 41.0 0.0 resps2 = resps2.ravel() 48 1 892 892.0 0.0 print resps2.mean()
On Fri, Feb 20, 2009 at 11:55 PM, David Cournapeau <cournape@gmail.com> wrote:
On Sat, Feb 21, 2009 at 1:46 PM, Nicolas Pinto <pinto@mit.edu> wrote:
Dear all,
I'm trying to optimize the code below and I was wondering if there is
an
efficient method that could reduce the numpy slicing overheard without going with cython. Is there anyway I could use mgrid to get a matrix with all my "windows" and then do a large matrix multiply instead?
If you only care about removing the two loops for the per-window processing, Anne Archibald and Robert Kern wrote a very useful function, segment_axis, which is like the matlab buffer function on steroids, using numpy stride tricks (to avoid copies in many cases). I think that would do everything you want, right ? I use it a lot in my own code, it may be worth being included in numpy or scipy proper.
http://projects.scipy.org/scipy/scikits/browser/trunk/talkbox/scikits/talkbo...
David _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Thanks again!
-- Nicolas Pinto Ph.D. Candidate, Brain & Computer Sciences Massachusetts Institute of Technology, USA http://web.mit.edu/pinto
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Hi,
I don't really understand your array dimension, but to me it seems that you could use ndimage.correlate2d for this.
Josef _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
-- Nicolas Pinto Ph.D. Candidate, Brain & Computer Sciences Massachusetts Institute of Technology, USA http://web.mit.edu/pinto
participants (3)
-
David Cournapeau
-
josef.pktd@gmail.com
-
Nicolas Pinto