Mailman 3 best way of speeding up a filtering-like algorithm - NumPy-Discussion

best way of speeding up a filtering-like algorithm

Moroney, Catherine M (398E)

29 Mar 2018 29 Mar '18

12:56 a.m.

Hello, I have the following sample code (pretty simple algorithm that uses a rolling filter window) and am wondering what the best way is of speeding it up. I tried rewriting it in Cython by pre-declaring the variables but that didn’t buy me a lot of time. Then I rewrote it in Fortran (and compiled it with f2py) and now it’s lightning fast. But I would still like to know if I could rewrite it in pure python/numpy/scipy or in Cython and get a similar speedup. Here is the raw Python code: def mixed_coastline_slow(nsidc, radius, count, mask=None): nsidc_copy = numpy.copy(nsidc) if (mask is None): idx_coastline = numpy.where(nsidc_copy == NSIDC_COASTLINE_MIXED) else: idx_coastline = numpy.where(mask & (nsidc_copy == NSIDC_COASTLINE_MIXED)) for (irow0, icol0) in zip(idx_coastline[0], idx_coastline[1]): rows = ( max(irow0-radius, 0), min(irow0+radius+1, nsidc_copy.shape[0]) ) cols = ( max(icol0-radius, 0), min(icol0+radius+1, nsidc_copy.shape[1]) ) window = nsidc[rows[0]:rows[1], cols[0]:cols[1]] npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, False).sum() nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window <= NSIDC_FRESHSNOW), \ True, False).sum() if (100.0*nsnowice/npoints >= count): nsidc_copy[irow0, icol0] = MISR_SEAICE_THRESHOLD return nsidc_copy and here is my attempt at Cython-izing it: import numpy cimport numpy as cnumpy cimport cython cdef int NSIDC_SIZE = 721 cdef int NSIDC_NO_SNOW = 0 cdef int NSIDC_ALL_SNOW = 100 cdef int NSIDC_FRESHSNOW = 103 cdef int NSIDC_PERMSNOW = 101 cdef int NSIDC_SEAICE_LOW = 1 cdef int NSIDC_SEAICE_HIGH = 100 cdef int NSIDC_COASTLINE_MIXED = 252 cdef int NSIDC_SUSPECT_ICE = 253 cdef int MISR_SEAICE_THRESHOLD = 6 def mixed_coastline(cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc, int radius, int count): cdef int irow, icol, irow1, irow2, icol1, icol2, npoints, nsnowice cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc2 \ = numpy.empty(shape=(NSIDC_SIZE, NSIDC_SIZE), dtype=numpy.uint8) cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] window \ = numpy.empty(shape=(2*radius+1, 2*radius+1), dtype=numpy.uint8) nsidc2 = numpy.copy(nsidc) idx_coastline = numpy.where(nsidc2 == NSIDC_COASTLINE_MIXED) for (irow, icol) in zip(idx_coastline[0], idx_coastline[1]): irow1 = max(irow-radius, 0) irow2 = min(irow+radius+1, NSIDC_SIZE) icol1 = max(icol-radius, 0) icol2 = min(icol+radius+1, NSIDC_SIZE) window = nsidc[irow1:irow2, icol1:icol2] npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, False).sum() nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window <= NSIDC_FRESHSNOW), \ True, False).sum() if (100.0*nsnowice/npoints >= count): nsidc2[irow, icol] = MISR_SEAICE_THRESHOLD return nsidc2 Thanks in advance for any advice! Catherine

Attachments:

attachment.htm (text/html — 13.7 KB)

Show replies by date

Eric Wieser

29 Mar 29 Mar

1:43 a.m.

Well, one tip to start with: numpy.where(some_comparison, True, False) is the same as but slower than some_comparison Eric On Wed, 28 Mar 2018 at 18:36 Moroney, Catherine M (398E) < Catherine.M.Moroney@jpl.nasa.gov> wrote:

...

Hello,

I have the following sample code (pretty simple algorithm that uses a rolling filter window) and am wondering what the best way is of speeding it up. I tried rewriting it in Cython by pre-declaring the variables but that didn’t buy me a lot of time. Then I rewrote it in Fortran (and compiled it with f2py) and now it’s lightning fast. But I would still like to know if I could rewrite it in pure python/numpy/scipy or in Cython and get a similar speedup.

Here is the raw Python code:

def mixed_coastline_slow(nsidc, radius, count, mask=None):

nsidc_copy = numpy.copy(nsidc)

if (mask is None):

idx_coastline = numpy.where(nsidc_copy == NSIDC_COASTLINE_MIXED)

else:

idx_coastline = numpy.where(mask & (nsidc_copy == NSIDC_COASTLINE_MIXED))

for (irow0, icol0) in zip(idx_coastline[0], idx_coastline[1]):

rows = ( max(irow0-radius, 0), min(irow0+radius+1, nsidc_copy.shape[0]) )

cols = ( max(icol0-radius, 0), min(icol0+radius+1, nsidc_copy.shape[1]) )

window = nsidc[rows[0]:rows[1], cols[0]:cols[1]]

npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, False).sum()

nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window <= NSIDC_FRESHSNOW), \

True, False).sum()

if (100.0*nsnowice/npoints >= count):

nsidc_copy[irow0, icol0] = MISR_SEAICE_THRESHOLD

return nsidc_copy

and here is my attempt at Cython-izing it:

import numpy

cimport numpy as cnumpy

cimport cython

cdef int NSIDC_SIZE = 721

cdef int NSIDC_NO_SNOW = 0

cdef int NSIDC_ALL_SNOW = 100

cdef int NSIDC_FRESHSNOW = 103

cdef int NSIDC_PERMSNOW = 101

cdef int NSIDC_SEAICE_LOW = 1

cdef int NSIDC_SEAICE_HIGH = 100

cdef int NSIDC_COASTLINE_MIXED = 252

cdef int NSIDC_SUSPECT_ICE = 253

cdef int MISR_SEAICE_THRESHOLD = 6

def mixed_coastline(cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc, int radius, int count):

cdef int irow, icol, irow1, irow2, icol1, icol2, npoints, nsnowice

cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc2 \

= numpy.empty(shape=(NSIDC_SIZE, NSIDC_SIZE), dtype=numpy.uint8)

cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] window \

= numpy.empty(shape=(2*radius+1, 2*radius+1), dtype=numpy.uint8)

nsidc2 = numpy.copy(nsidc)

idx_coastline = numpy.where(nsidc2 == NSIDC_COASTLINE_MIXED)

for (irow, icol) in zip(idx_coastline[0], idx_coastline[1]):

irow1 = max(irow-radius, 0)

irow2 = min(irow+radius+1, NSIDC_SIZE)

icol1 = max(icol-radius, 0)

icol2 = min(icol+radius+1, NSIDC_SIZE)

window = nsidc[irow1:irow2, icol1:icol2]

npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, False).sum()

nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window <= NSIDC_FRESHSNOW), \

True, False).sum()

if (100.0*nsnowice/npoints >= count):

nsidc2[irow, icol] = MISR_SEAICE_THRESHOLD

return nsidc2

Thanks in advance for any advice!

Catherine

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Joseph Fox-Rabinovitz

4:10 a.m.

It looks like you are creating a coastline mask (or a coastline mask + some other mask), and computing the ratio of two quantities in a particular window around each point. If your coastline covers a sufficiently large portion of the image, you may get quite a bit of mileage using an efficient convolution instead of summing the windows directly. For example, you could use scipy.signal.convolve2d with inputs being (nsidc_copy != NSIDC_COASTLINE_MIXED), (nsidc_copy == NSIDC_SEAICE_LOW & nsdic_copy == NSIDC_FRESHSNOW) for the frst array, and a (2*radius x 2*radius) array of ones for the second. You may have to center the block of ones in an array of zeros the same size as nsdic_copy, but I am not sure about that. Another option you may want to try is implementing your window movement more efficiently. If you step your window center along using an algorithm like flood-fill, you can insure that there will be very large overlap between successive steps (even if there is a break in the coastline). That means that you can reuse most of the data you've extracted. You will only need to subtract off the non-overlapping portion of the previous window and add in the non-overlapping portion of the updated window. If radius is 16, giving you a 32x32 window, you go from summing ~1000 pixels per quantity of interest, to summing only ~120 if the window moves along a diagonal, and only 64 if it moves vertically or horizontally. While an algorithm like this will probably give you the greatest boost, it is a pain to implement. If I had to guess, this looks like L2 processing for a multi-spectral instrument. If you don't mind me asking, what mission is this for? I'm working on space-looking detectors at the moment, but have spent many years on the L0, L1b and L1 portions of the GOES-R ground system. - Joe On Wed, Mar 28, 2018 at 9:43 PM, Eric Wieser wrote:

...

Well, one tip to start with:

numpy.where(some_comparison, True, False)

is the same as but slower than

some_comparison

Eric

On Wed, 28 Mar 2018 at 18:36 Moroney, Catherine M (398E) wrote:

...
Hello,

I have the following sample code (pretty simple algorithm that uses a rolling filter window) and am wondering what the best way is of speeding it up. I tried rewriting it in Cython by pre-declaring the variables but that didn’t buy me a lot of time. Then I rewrote it in Fortran (and compiled it with f2py) and now it’s lightning fast. But I would still like to know if I could rewrite it in pure python/numpy/scipy or in Cython and get a similar speedup.

Here is the raw Python code:

def mixed_coastline_slow(nsidc, radius, count, mask=None):

nsidc_copy = numpy.copy(nsidc)

if (mask is None):

idx_coastline = numpy.where(nsidc_copy == NSIDC_COASTLINE_MIXED)

else:

idx_coastline = numpy.where(mask & (nsidc_copy == NSIDC_COASTLINE_MIXED))

for (irow0, icol0) in zip(idx_coastline[0], idx_coastline[1]):

rows = ( max(irow0-radius, 0), min(irow0+radius+1, nsidc_copy.shape[0]) )

cols = ( max(icol0-radius, 0), min(icol0+radius+1, nsidc_copy.shape[1]) )

window = nsidc[rows[0]:rows[1], cols[0]:cols[1]]

npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, False).sum()

nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window <= NSIDC_FRESHSNOW), \

True, False).sum()

if (100.0*nsnowice/npoints >= count):

nsidc_copy[irow0, icol0] = MISR_SEAICE_THRESHOLD

return nsidc_copy

and here is my attempt at Cython-izing it:

import numpy

cimport numpy as cnumpy

cimport cython

cdef int NSIDC_SIZE = 721

cdef int NSIDC_NO_SNOW = 0

cdef int NSIDC_ALL_SNOW = 100

cdef int NSIDC_FRESHSNOW = 103

cdef int NSIDC_PERMSNOW = 101

cdef int NSIDC_SEAICE_LOW = 1

cdef int NSIDC_SEAICE_HIGH = 100

cdef int NSIDC_COASTLINE_MIXED = 252

cdef int NSIDC_SUSPECT_ICE = 253

cdef int MISR_SEAICE_THRESHOLD = 6

def mixed_coastline(cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc, int radius, int count):

cdef int irow, icol, irow1, irow2, icol1, icol2, npoints, nsnowice

cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc2 \

= numpy.empty(shape=(NSIDC_SIZE, NSIDC_SIZE), dtype=numpy.uint8)

cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] window \

= numpy.empty(shape=(2*radius+1, 2*radius+1), dtype=numpy.uint8)

nsidc2 = numpy.copy(nsidc)

idx_coastline = numpy.where(nsidc2 == NSIDC_COASTLINE_MIXED)

for (irow, icol) in zip(idx_coastline[0], idx_coastline[1]):

irow1 = max(irow-radius, 0)

irow2 = min(irow+radius+1, NSIDC_SIZE)

icol1 = max(icol-radius, 0)

icol2 = min(icol+radius+1, NSIDC_SIZE)

window = nsidc[irow1:irow2, icol1:icol2]

npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, False).sum()

nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window <= NSIDC_FRESHSNOW), \

True, False).sum()

if (100.0*nsnowice/npoints >= count):

nsidc2[irow, icol] = MISR_SEAICE_THRESHOLD

return nsidc2

Thanks in advance for any advice!

Catherine

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Stuart Reynolds

3:14 p.m.

Install snakeviz to visualize what’s taking all the time. You might want to check out numba.jit(nopython) for optimizing specific sections. On Wed, Mar 28, 2018 at 9:10 PM Joseph Fox-Rabinovitz < jfoxrabinovitz@gmail.com> wrote:

...

It looks like you are creating a coastline mask (or a coastline mask + some other mask), and computing the ratio of two quantities in a particular window around each point. If your coastline covers a sufficiently large portion of the image, you may get quite a bit of mileage using an efficient convolution instead of summing the windows directly. For example, you could use scipy.signal.convolve2d with inputs being (nsidc_copy != NSIDC_COASTLINE_MIXED), (nsidc_copy == NSIDC_SEAICE_LOW & nsdic_copy == NSIDC_FRESHSNOW) for the frst array, and a (2*radius x 2*radius) array of ones for the second. You may have to center the block of ones in an array of zeros the same size as nsdic_copy, but I am not sure about that.

Another option you may want to try is implementing your window movement more efficiently. If you step your window center along using an algorithm like flood-fill, you can insure that there will be very large overlap between successive steps (even if there is a break in the coastline). That means that you can reuse most of the data you've extracted. You will only need to subtract off the non-overlapping portion of the previous window and add in the non-overlapping portion of the updated window. If radius is 16, giving you a 32x32 window, you go from summing ~1000 pixels per quantity of interest, to summing only ~120 if the window moves along a diagonal, and only 64 if it moves vertically or horizontally. While an algorithm like this will probably give you the greatest boost, it is a pain to implement.

If I had to guess, this looks like L2 processing for a multi-spectral instrument. If you don't mind me asking, what mission is this for? I'm working on space-looking detectors at the moment, but have spent many years on the L0, L1b and L1 portions of the GOES-R ground system.

- Joe

...
Well, one tip to start with:

numpy.where(some_comparison, True, False)

is the same as but slower than

some_comparison

Eric

On Wed, 28 Mar 2018 at 18:36 Moroney, Catherine M (398E) wrote:

...
Hello,

I have the following sample code (pretty simple algorithm that uses a rolling filter window) and am wondering what the best way is of

speeding it

...
up. I tried rewriting it in Cython by pre-declaring the variables but

On Wed, Mar 28, 2018 at 9:43 PM, Eric Wieser wrote: that

...
...
didn’t buy me a lot of time. Then I rewrote it in Fortran (and compiled it with f2py) and now it’s lightning fast. But I would still like to know if I could rewrite it in pure python/numpy/scipy or in Cython and get a similar speedup.

Here is the raw Python code:

def mixed_coastline_slow(nsidc, radius, count, mask=None):

nsidc_copy = numpy.copy(nsidc)

if (mask is None):

idx_coastline = numpy.where(nsidc_copy == NSIDC_COASTLINE_MIXED)

else:

idx_coastline = numpy.where(mask & (nsidc_copy == NSIDC_COASTLINE_MIXED))

for (irow0, icol0) in zip(idx_coastline[0], idx_coastline[1]):

rows = ( max(irow0-radius, 0), min(irow0+radius+1, nsidc_copy.shape[0]) )

cols = ( max(icol0-radius, 0), min(icol0+radius+1, nsidc_copy.shape[1]) )

window = nsidc[rows[0]:rows[1], cols[0]:cols[1]]

npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, False).sum()

nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window <= NSIDC_FRESHSNOW), \

True, False).sum()

if (100.0*nsnowice/npoints >= count):

nsidc_copy[irow0, icol0] = MISR_SEAICE_THRESHOLD

return nsidc_copy

and here is my attempt at Cython-izing it:

import numpy

cimport numpy as cnumpy

cimport cython

cdef int NSIDC_SIZE = 721

cdef int NSIDC_NO_SNOW = 0

cdef int NSIDC_ALL_SNOW = 100

cdef int NSIDC_FRESHSNOW = 103

cdef int NSIDC_PERMSNOW = 101

cdef int NSIDC_SEAICE_LOW = 1

cdef int NSIDC_SEAICE_HIGH = 100

cdef int NSIDC_COASTLINE_MIXED = 252

cdef int NSIDC_SUSPECT_ICE = 253

cdef int MISR_SEAICE_THRESHOLD = 6

def mixed_coastline(cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc, int radius, int count):

cdef int irow, icol, irow1, irow2, icol1, icol2, npoints, nsnowice

cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc2 \

= numpy.empty(shape=(NSIDC_SIZE, NSIDC_SIZE), dtype=numpy.uint8)

cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] window \

= numpy.empty(shape=(2*radius+1, 2*radius+1), dtype=numpy.uint8)

nsidc2 = numpy.copy(nsidc)

idx_coastline = numpy.where(nsidc2 == NSIDC_COASTLINE_MIXED)

for (irow, icol) in zip(idx_coastline[0], idx_coastline[1]):

irow1 = max(irow-radius, 0)

irow2 = min(irow+radius+1, NSIDC_SIZE)

icol1 = max(icol-radius, 0)

icol2 = min(icol+radius+1, NSIDC_SIZE)

window = nsidc[irow1:irow2, icol1:icol2]

npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, False).sum()

nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window <= NSIDC_FRESHSNOW), \

True, False).sum()

if (100.0*nsnowice/npoints >= count):

nsidc2[irow, icol] = MISR_SEAICE_THRESHOLD

return nsidc2

Thanks in advance for any advice!

Catherine

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Chris Barker

5:23 p.m.

sorry, not enough time to look closely, but a couple general comments: On Wed, Mar 28, 2018 at 5:56 PM, Moroney, Catherine M (398E) < Catherine.M.Moroney@jpl.nasa.gov> wrote:

...

I have the following sample code (pretty simple algorithm that uses a rolling filter window) and am wondering what the best way is of speeding it up. I tried rewriting it in Cython by pre-declaring the variables but that didn’t buy me a lot of time. Then I rewrote it in Fortran (and compiled it with f2py) and now it’s lightning fast.

if done right, Cython should be almost as fast as Fortran, and just as fast if you use the "restrict" correctly (which I hope can be done in Cython): https://en.wikipedia.org/wiki/Pointer_aliasing

...

But I would still like to know if I could rewrite it in pure python/numpy/scipy

you can use stride_tricks to make arrays "appear" to be N+1 D, to implement windows without actually duplicating the data, and then use array operations on them. This can buy a lot of speed, but will not be as fast (by a factor of 10 or so) as Cython or Fortran see: https://github.com/PythonCHB/IRIS_Python_Class/blob/master/Numpy/code/filter... for and example in 1D

...

or in Cython and get a similar speedup.

see above -- a direct port of your Fortran code to Cython should get you within a factor of two or so of the Fortran, and then using "restrict" to let the compiler know your pointers aren't aliased should get you the reset of the way. Here is an example of a Automatic Gain Control filter in 1D, iplimented in numpy with stride_triks, and C and Cython and Fortran. https://github.com/PythonCHB/IRIS_Python_Class/tree/master/Interfacing_C/agc... Note that in that example, I never got C or Cython as fast as Fortran -- but I think using "restrict" in the C would do it. HTH, -CHB

...

Here is the raw Python code:

def mixed_coastline_slow(nsidc, radius, count, mask=None):

nsidc_copy = numpy.copy(nsidc)

if (mask is None):

idx_coastline = numpy.where(nsidc_copy == NSIDC_COASTLINE_MIXED)

else:

idx_coastline = numpy.where(mask & (nsidc_copy == NSIDC_COASTLINE_MIXED))

for (irow0, icol0) in zip(idx_coastline[0], idx_coastline[1]):

rows = ( max(irow0-radius, 0), min(irow0+radius+1, nsidc_copy.shape[0]) )

cols = ( max(icol0-radius, 0), min(icol0+radius+1, nsidc_copy.shape[1]) )

window = nsidc[rows[0]:rows[1], cols[0]:cols[1]]

npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, False).sum()

nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window <= NSIDC_FRESHSNOW), \

True, False).sum()

if (100.0*nsnowice/npoints >= count):

nsidc_copy[irow0, icol0] = MISR_SEAICE_THRESHOLD

return nsidc_copy

and here is my attempt at Cython-izing it:

import numpy

cimport numpy as cnumpy

cimport cython

cdef int NSIDC_SIZE = 721

cdef int NSIDC_NO_SNOW = 0

cdef int NSIDC_ALL_SNOW = 100

cdef int NSIDC_FRESHSNOW = 103

cdef int NSIDC_PERMSNOW = 101

cdef int NSIDC_SEAICE_LOW = 1

cdef int NSIDC_SEAICE_HIGH = 100

cdef int NSIDC_COASTLINE_MIXED = 252

cdef int NSIDC_SUSPECT_ICE = 253

cdef int MISR_SEAICE_THRESHOLD = 6

def mixed_coastline(cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc, int radius, int count):

cdef int irow, icol, irow1, irow2, icol1, icol2, npoints, nsnowice

cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc2 \

= numpy.empty(shape=(NSIDC_SIZE, NSIDC_SIZE), dtype=numpy.uint8)

cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] window \

= numpy.empty(shape=(2*radius+1, 2*radius+1), dtype=numpy.uint8)

nsidc2 = numpy.copy(nsidc)

idx_coastline = numpy.where(nsidc2 == NSIDC_COASTLINE_MIXED)

for (irow, icol) in zip(idx_coastline[0], idx_coastline[1]):

irow1 = max(irow-radius, 0)

irow2 = min(irow+radius+1, NSIDC_SIZE)

icol1 = max(icol-radius, 0)

icol2 = min(icol+radius+1, NSIDC_SIZE)

window = nsidc[irow1:irow2, icol1:icol2]

npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, False).sum()

nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window <= NSIDC_FRESHSNOW), \

True, False).sum()

if (100.0*nsnowice/npoints >= count):

nsidc2[irow, icol] = MISR_SEAICE_THRESHOLD

return nsidc2

Thanks in advance for any advice!

Catherine

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Chris Barker

5:26 p.m.

one other note: As a rule, using numpy array operations from Cython doesn't buy you much, as you discovered. YOu need to use numpy arrays as n-d containers, and write the loops yourself. You may want to check out numba as another alternative -- it DOES optimize numpy operations. -CHB On Wed, Mar 28, 2018 at 5:56 PM, Moroney, Catherine M (398E) < Catherine.M.Moroney@jpl.nasa.gov> wrote:

...

Hello,

I have the following sample code (pretty simple algorithm that uses a rolling filter window) and am wondering what the best way is of speeding it up. I tried rewriting it in Cython by pre-declaring the variables but that didn’t buy me a lot of time. Then I rewrote it in Fortran (and compiled it with f2py) and now it’s lightning fast. But I would still like to know if I could rewrite it in pure python/numpy/scipy or in Cython and get a similar speedup.

Here is the raw Python code:

def mixed_coastline_slow(nsidc, radius, count, mask=None):

nsidc_copy = numpy.copy(nsidc)

if (mask is None):

idx_coastline = numpy.where(nsidc_copy == NSIDC_COASTLINE_MIXED)

else:

idx_coastline = numpy.where(mask & (nsidc_copy == NSIDC_COASTLINE_MIXED))

for (irow0, icol0) in zip(idx_coastline[0], idx_coastline[1]):

rows = ( max(irow0-radius, 0), min(irow0+radius+1, nsidc_copy.shape[0]) )

cols = ( max(icol0-radius, 0), min(icol0+radius+1, nsidc_copy.shape[1]) )

window = nsidc[rows[0]:rows[1], cols[0]:cols[1]]

npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, False).sum()

nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window <= NSIDC_FRESHSNOW), \

True, False).sum()

if (100.0*nsnowice/npoints >= count):

nsidc_copy[irow0, icol0] = MISR_SEAICE_THRESHOLD

return nsidc_copy

and here is my attempt at Cython-izing it:

import numpy

cimport numpy as cnumpy

cimport cython

cdef int NSIDC_SIZE = 721

cdef int NSIDC_NO_SNOW = 0

cdef int NSIDC_ALL_SNOW = 100

cdef int NSIDC_FRESHSNOW = 103

cdef int NSIDC_PERMSNOW = 101

cdef int NSIDC_SEAICE_LOW = 1

cdef int NSIDC_SEAICE_HIGH = 100

cdef int NSIDC_COASTLINE_MIXED = 252

cdef int NSIDC_SUSPECT_ICE = 253

cdef int MISR_SEAICE_THRESHOLD = 6

def mixed_coastline(cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc, int radius, int count):

cdef int irow, icol, irow1, irow2, icol1, icol2, npoints, nsnowice

cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc2 \

= numpy.empty(shape=(NSIDC_SIZE, NSIDC_SIZE), dtype=numpy.uint8)

cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] window \

= numpy.empty(shape=(2*radius+1, 2*radius+1), dtype=numpy.uint8)

nsidc2 = numpy.copy(nsidc)

idx_coastline = numpy.where(nsidc2 == NSIDC_COASTLINE_MIXED)

for (irow, icol) in zip(idx_coastline[0], idx_coastline[1]):

irow1 = max(irow-radius, 0)

irow2 = min(irow+radius+1, NSIDC_SIZE)

icol1 = max(icol-radius, 0)

icol2 = min(icol+radius+1, NSIDC_SIZE)

window = nsidc[irow1:irow2, icol1:icol2]

npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, False).sum()

nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window <= NSIDC_FRESHSNOW), \

True, False).sum()

if (100.0*nsnowice/npoints >= count):

nsidc2[irow, icol] = MISR_SEAICE_THRESHOLD

return nsidc2

Thanks in advance for any advice!

Catherine

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Jaime Fernández del Río

9:34 p.m.

...

From here it should be more or less straightforward to reproduce the rest of your calculations. As written this code only handles points a distance of at least radius from an array edge. If the edges are important to you,

Hi Catherine, One problem with sliding window algorithms is that the straightforward approach can be very inefficient. Ideally you would want to not recompute your windowed quantity from all points in the window, but to reuse the result from an overlapping window and only take into account the points that have changed in the sliding of the window. In your case this can be efficiently done using a summed area table https://en.wikipedia.org/wiki/Summed-area_table. Consider these two auxiliary functions: def summed_area_table(array): rows, cols = array.shape out = np.zeros((rows + 1, cols + 1), np.intp) np.cumsum(array, axis=0, out=out[1:, 1:]) np.cumsum(out[1:, 1:], axis=1, out=out[1:, 1:]) return out def windowed_sum_from_summed_area_table(array, size): sat = summed_area_table(array) return (sat[:-size, :-size] + sat[size:, size:] - sat[:-size, size:] - sat[size:, -size:]) Using these, you can compute npoints and nsnowice for all points in your input nsidc array as: mask_coastline = nsidc == NSIDC_COASTLINE_MIXED mask_not_coastline = ~mask_coastline mask_snowice = (nsidc >= NSIDC_SEAICE_LOW) & (nsidc <= NSIDC_FRESHSNOW) nsnowice = windowed_sum_from_summed_area_table(mask_snowice, 2*radius + 1) npoints = windowed_sum_from_summed_area_table(mask_not_coastline, 2*radius + 1) they can also be extracted from the summed area table, but the expressions get ugly: it may be cleaner, even if slower, to pad the masks with zeros before summing them up. Also, if the fraction of points that are in mask_coastline is very small, you may be doing way too many unnecessary calculations. Good luck! Jaime On Thu, Mar 29, 2018 at 3:36 AM Moroney, Catherine M (398E) < Catherine.M.Moroney@jpl.nasa.gov> wrote:

...

Hello,

I have the following sample code (pretty simple algorithm that uses a rolling filter window) and am wondering what the best way is of speeding it up. I tried rewriting it in Cython by pre-declaring the variables but that didn’t buy me a lot of time. Then I rewrote it in Fortran (and compiled it with f2py) and now it’s lightning fast. But I would still like to know if I could rewrite it in pure python/numpy/scipy or in Cython and get a similar speedup.

Here is the raw Python code:

def mixed_coastline_slow(nsidc, radius, count, mask=None):

nsidc_copy = numpy.copy(nsidc)

if (mask is None):

idx_coastline = numpy.where(nsidc_copy == NSIDC_COASTLINE_MIXED)

else:

idx_coastline = numpy.where(mask & (nsidc_copy == NSIDC_COASTLINE_MIXED))

for (irow0, icol0) in zip(idx_coastline[0], idx_coastline[1]):

rows = ( max(irow0-radius, 0), min(irow0+radius+1, nsidc_copy.shape[0]) )

cols = ( max(icol0-radius, 0), min(icol0+radius+1, nsidc_copy.shape[1]) )

window = nsidc[rows[0]:rows[1], cols[0]:cols[1]]

npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, False).sum()

nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window <= NSIDC_FRESHSNOW), \

True, False).sum()

if (100.0*nsnowice/npoints >= count):

nsidc_copy[irow0, icol0] = MISR_SEAICE_THRESHOLD

return nsidc_copy

and here is my attempt at Cython-izing it:

import numpy

cimport numpy as cnumpy

cimport cython

cdef int NSIDC_SIZE = 721

cdef int NSIDC_NO_SNOW = 0

cdef int NSIDC_ALL_SNOW = 100

cdef int NSIDC_FRESHSNOW = 103

cdef int NSIDC_PERMSNOW = 101

cdef int NSIDC_SEAICE_LOW = 1

cdef int NSIDC_SEAICE_HIGH = 100

cdef int NSIDC_COASTLINE_MIXED = 252

cdef int NSIDC_SUSPECT_ICE = 253

cdef int MISR_SEAICE_THRESHOLD = 6

def mixed_coastline(cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc, int radius, int count):

cdef int irow, icol, irow1, irow2, icol1, icol2, npoints, nsnowice

cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc2 \

= numpy.empty(shape=(NSIDC_SIZE, NSIDC_SIZE), dtype=numpy.uint8)

cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] window \

= numpy.empty(shape=(2*radius+1, 2*radius+1), dtype=numpy.uint8)

nsidc2 = numpy.copy(nsidc)

idx_coastline = numpy.where(nsidc2 == NSIDC_COASTLINE_MIXED)

for (irow, icol) in zip(idx_coastline[0], idx_coastline[1]):

irow1 = max(irow-radius, 0)

irow2 = min(irow+radius+1, NSIDC_SIZE)

icol1 = max(icol-radius, 0)

icol2 = min(icol+radius+1, NSIDC_SIZE)

window = nsidc[irow1:irow2, icol1:icol2]

npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True, False).sum()

nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window <= NSIDC_FRESHSNOW), \

True, False).sum()

if (100.0*nsnowice/npoints >= count):

nsidc2[irow, icol] = MISR_SEAICE_THRESHOLD

return nsidc2

Thanks in advance for any advice!

Catherine

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.

2219

Age (days ago)

2219

Last active (days ago)

List overview

Download

6 comments

6 participants

participants (6)

Chris Barker
Eric Wieser
Jaime Fernández del Río
Joseph Fox-Rabinovitz
Moroney, Catherine M (398E)
Stuart Reynolds

best way of speeding up a filtering-like algorithm

Moroney, Catherine M (398E)

Eric Wieser

Joseph Fox-Rabinovitz

Stuart Reynolds

Chris Barker

Chris Barker

Jaime Fernández del Río

tags

participants (6)