Optimized half-sizing of images?
(second try on this message. the fist time I included a test PNG that made it too large) Hi folks, We have a need to to generate half-size version of RGB images as quickly as possible. PIL does a pretty good job, but it dawned on me that in the special case of a half-size, one might be able to do it faster with numpy, simply averaging the four pixels in the larger image to create one in the small. I'm doing tiling, and thus reducing 512x512 images to 256x256, so I imagine I'm making good use of cache (it does get pretty pokey with really large images!) What I have now is essentially this : # a is a (h, w, 3) RGB array a2 = a[0::2, 0::2, :].astype(np.uint16) a2 += a[0::2, 1::2, :] a2 += a[1::2, 0::2, :] a2 += a[1::2, 1::2, :] a2 /= 4 return a2.astype(np.uint8) time: 67.2 ms per loop I can speed it up a bit if I accumulate in a uint8 and divide as I go to prevent overflow: a2 = a[0::2, 0::2, :].astype(np.uint8) / 4 a2 += a[0::2, 1::2, :] / 4 a2 += a[1::2, 0::2, :] / 4 a2 += a[1::2, 1::2, :] / 4 return a2 time: 46.6 ms per loop That does lose a touch of accuracy, I suppose, but nothing I can see. Method 1 is about twice as slow as PIL's bilinear scaling. Can I do better? It seems it should be faster if I can avoid so many separate loops through the array. I figure there may be some way with filter or convolve or ndimage, but they all seem to return an array the same size. Any ideas? (Cython is another option, of course) -Chris Test code enclosed. -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Hi Chris 2009/8/6 Christopher Barker <Chris.Barker@noaa.gov>:
Can I do better? It seems it should be faster if I can avoid so many separate loops through the array. I figure there may be some way with filter or convolve or ndimage, but they all seem to return an array the same size.
Are you willing to depend on SciPy? We've got pretty fast zooming code in ndimage. If speed is a big issue, I'd consider using the GPU, which was made for this sort of down-sampling. Cheers Stéfan
Stéfan van der Walt wrote:
Are you willing to depend on SciPy? We've got pretty fast zooming code in ndimage.
I looked there, and didn't see it -- didn't think to look for "zoom". Now I do, I'll give it a try. However, my thought is that for the special case of half-sizing, all that spline stuff could be unneeded and slower. I'll see what we get. thanks, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Christopher Barker wrote:
Stéfan van der Walt wrote:
Are you willing to depend on SciPy? We've got pretty fast zooming code in ndimage.
However, my thought is that for the special case of half-sizing, all that spline stuff could be unneeded and slower. I'll see what we get.
I've given that a try. The docs are pretty sparse. I couldn't figure out a way to get it to do the whole image at once. Rather I had to do each band separately. It ended up about the same speed as my int accumulator method: def test5(a): """ using ndimage tested on 512x512 RGB image: time: 40 ms per loop """ h = a.shape[0]/2 w = a.shape[1]/2 a2 = np.empty((h, w, 3), dtype=np.uint8) a2[:,:,0] = ndi.zoom(a[:,:,0], 0.5, order=1) a2[:,:,1] = ndi.zoom(a[:,:,1], 0.5, order=1) a2[:,:,2] = ndi.zoom(a[:,:,2], 0.5, order=1) return a2 Is there a way to get it to do all three colorbands at once? NOTE: if I didn't set order to 1, it was MUCH slower, as one might expect. -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
We have a need to to generate half-size version of RGB images as quickly as possible.
How good do these need to look? You could just throw away every other pixel... image[::2, ::2]. Failing that, you could also try using ndimage's convolve routines to run a 2x2 box filter over the image, and then throw away half of the pixels. But this would be slower than optimal, because the kernel would be convolved over every pixel, not just the ones you intend to keep. Really though, I'd just bite the bullet and write a C extension (or cython, whatever, an extension to work for a defined-dimensionality, defined-dtype array is pretty simple), or as suggested before, do it on the GPU. (Though I find that readback from the GPU can be slow enough that C code can beat it in some cases.) Zach
On Fri, Aug 7, 2009 at 3:46 AM, Zachary Pincus<zachary.pincus@yale.edu> wrote:
We have a need to to generate half-size version of RGB images as quickly as possible.
How good do these need to look? You could just throw away every other pixel... image[::2, ::2].
Failing that, you could also try using ndimage's convolve routines to run a 2x2 box filter over the image, and then throw away half of the pixels. But this would be slower than optimal, because the kernel would be convolved over every pixel, not just the ones you intend to keep.
Really though, I'd just bite the bullet and write a C extension (or cython, whatever, an extension to work for a defined-dimensionality, defined-dtype array is pretty simple), or as suggested before, do it on the GPU. (Though I find that readback from the GPU can be slow enough that C code can beat it in some cases.)
Zach
Chris, regarding your concerns of doing to fancy interpolation at the cost of speed, I would guess the overall bottle neck is rather the memory access than the extra CPU cycles needed for interpolation. Regarding ndimage.zoom it should be able to "not zoom" the color-axis but the others in one call. Cheers, -- Sebastian Haase
On Aug 7, 2009, at 12:23 AM, Sebastian Haase wrote:
On Fri, Aug 7, 2009 at 3:46 AM, Zachary Pincus<zachary.pincus@yale.edu> wrote:
We have a need to to generate half-size version of RGB images as quickly as possible.
How good do these need to look? You could just throw away every other pixel... image[::2, ::2].
Failing that, you could also try using ndimage's convolve routines to run a 2x2 box filter over the image, and then throw away half of the pixels. But this would be slower than optimal, because the kernel would be convolved over every pixel, not just the ones you intend to keep.
Really though, I'd just bite the bullet
You say that as if it's painful to do so :) ------------------------------------- import cython import numpy as np cimport numpy as np @cython.boundscheck(False) def halfsize_cython(np.ndarray[np.uint8_t, ndim=2, mode="c"] a): cdef unsigned int i, j, w, h w, h = a.shape[0], a.shape[1] cdef np.ndarray[np.uint8_t, ndim=2, mode="c"] a2 = np.ndarray((w/ 2, h/2), np.uint8) for i in range(w/2): for j in range(h/2): a2[i,j] = (<int>a[2*i,2*j] + a[2*i+1,2*j] + a[2*i,2*j+1] + a[2*i+1,2*j+1])/4 return a2 def halfsize_slicing(a): a2 = a[0::2, 0::2].astype(np.uint8) / 4 a2 += a[0::2, 1::2] / 4 a2 += a[1::2, 0::2] / 4 a2 += a[1::2, 1::2] / 4 return a2 ------------------------------------- sage: import numpy; from half_size import * sage: a = numpy.ndarray((512, 512), numpy.uint8) sage: timeit("halfsize_cython(a)") 625 loops, best of 3: 604 µs per loop sage: timeit("halfsize_slicing(a)") 5 loops, best of 3: 2.72 ms per loop
and write a C extension (or cython, whatever, an extension to work for a defined-dimensionality, defined-dtype array is pretty simple), or as suggested before, do it on the GPU. (Though I find that readback from the GPU can be slow enough that C code can beat it in some cases.)
Zach
Chris, regarding your concerns of doing to fancy interpolation at the cost of speed, I would guess the overall bottle neck is rather the memory access than the extra CPU cycles needed for interpolation. Regarding ndimage.zoom it should be able to "not zoom" the color-axis but the others in one call.
I was about to say the same thing, it's probably the memory, not cycles, that's hurting you. Of course 512x512 is still small enough to fit in L2 of any modern computer. - Robert
To finish off the thread for posterity: Robert Bradshaw wrote: <Robert's Cython code clipped.> Robert's version operated on a 2-d array, so only one band at a time if you have RGB. So I edited it a bit: import cython import numpy as np cimport numpy as np @cython.boundscheck(False) def halfsize(np.ndarray[np.uint8_t, ndim=3, mode="c"] a): cdef unsigned int i, j, b, w, h, d w, h, d = a.shape[0], a.shape[1], a.shape[2] cdef np.ndarray[np.uint8_t, ndim=3, mode="c"] a2 = np.ndarray((w/2, h/2, 3), np.uint8) for i in range(w/2): for j in range(h/2): for b in range(d): # color band a2[i,j,b] = (<int>a[2*i,2*j,b] + a[2*i+1,2*j,b] + a[2*i,2*j+1,b] + a[2*i+1,2*j+1,b])/4 return a2 This now does the whole RGB image at once, and is pretty snappy. Here are my timings for half-sizing a 512x512 RGB image: slicing, accumulating with a float32 time: 89 ms per loop slicing, accumulating with a uint16 time: 67 ms per loop slicing, all calculations in uint8 time: 47 ms per loop using ndimage, one band at a time, 3rd order spline. time: 280 ms per loop using ndimage, one band at a time, 1st order spline. time: 40 ms per loop using cython, one band at a time time: 11.6 ms per loop using cython, all bands at once time: 2.66 ms per loop using PIL BILNEAR interpolation time: 2.66 ms per loop So a ten times speed up over PIL, and a 17 times speed up over my fastest numpy version. If anyone has any suggestions on how to improve on the Cython version, I'd like to hear it, though I doubt it would make a practical difference. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Zachary Pincus wrote:
We have a need to to generate half-size version of RGB images as quickly as possible.
How good do these need to look? You could just throw away every other pixel... image[::2, ::2].
I do the as good quality as I can get. throwing away pixels gets a bit ugly.
Failing that, you could also try using ndimage's convolve routines to run a 2x2 box filter over the image, and then throw away half of the pixels. But this would be slower than optimal, because the kernel would be convolved over every pixel, not just the ones you intend to keep.
yup -- worth a try though.
Really though, I'd just bite the bullet and write a C extension (or cython, whatever, an extension to work for a defined-dimensionality, defined-dtype array is pretty simple),
I was going to sit down and do that this morning, but...
or as suggested before, do it on the GPU.
I have no idea how to do that, except maybe pyOpenGL, which is on our list to try. Sebastian Haase wrote:
regarding your concerns of doing to fancy interpolation at the cost of speed, I would guess the overall bottle neck is rather the memory access than the extra CPU cycles needed for interpolation.
well, could be, though I can't really know 'till I try. One example, though is using ndimage.zoom -- order 1 interpolation is MUCH faster than order 2 or 3.
Regarding ndimage.zoom it should be able to "not zoom" the color-axis but the others in one call.
well, that's what I thought, but I can't figure out how to do it. The docs are a bit sparse. Here's my offer: If someone tells me how to do it, I'll make a docs contribution to the SciPy docs explaining it.
You say that as if it's painful to do so :)
wow! Thanks for doing my work for me. I thought this would be a good case to give Cython a try for the first time -- having a working example is great.
sage: timeit("halfsize_cython(a)") 625 loops, best of 3: 604 µs per loop sage: timeit("halfsize_slicing(a)") 5 loops, best of 3: 2.72 ms per loop
and bingo! a 4.5 times speed-up -- I think that's enough to see in our app.
I was about to say the same thing, it's probably the memory, not cycles, that's hurting you.
sure, but the slicing method pushes that memory around more than it needs to.
Of course 512x512 is still small enough to fit in L2 of any modern computer.
I think so -- I do know that the slicing method slows down a lot with larger images. We're tiling anyway in this case, but if I did want to do a big image, I'd probably break it down into chunks to process it anyway. thanks, all. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
participants (5)
-
Christopher Barker
-
Robert Bradshaw
-
Sebastian Haase
-
Stéfan van der Walt
-
Zachary Pincus