Question: Placement of channels as the last axis - is this a sensible memory layout?
Hi, I'm one of the maintainers of a project that has some overlap with scikit-image. Recently, I've been working on optimising some of our slower code paths, and I came across an interesting result. In our package, we order our image data in the same way as in scikit-image: image.shape = [height, width, channels] Given that cPython runs on top of C, this means that the normal memory layout is generally row-major. Therefore, in the C-contiguous layout of the above memory pattern, it means that the rightmost channel indexes fastest. For example, when iterating over a vectorised image (stolen shamelessly from Wikipedia): int A[2, 3, 4] = np.array([[[1,2,3,4], [5,6,7,8], [9,10,11,12]], [[13,14,15,16}, [17,18,19,20], [21,22,23,24]]) We walk that memory: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Practically, this means that we can most efficiently walk along the channels. However! In my experience, especially with 2D images, you usually store most of your relationships across the image and leave the channels as independent of one another. A simple example of this, and the one I was considering, is calculating the gradient of a multi-channel image (so a 3 channel RGB image will return a 6-channel image that represents the x and y gradient over each channel R, G and B). Now, by calculating the gradient on a [height, width, channels] image, I have to jump a lot further in memory to find my neighbours, because I'm also skipping over channels. This is actually very inefficient, and in my testing (implementing the gradient method in cython) this actually causes a huge bottleneck. For example, the more channels you have, the closer in performance my code comes to Numpy's pure Python gradient implementation. This is because both pieces of code are being hurt by having to work 'against' the memory layout. A practical example of this can be seen here: http://nbviewer.ipython.org/gist/patricksnape/7b31c14c7103c2995956 As mentioned, I have some code I've been working on that implements the gradient directly in Cython and uses OpenMP to parallelize the gradient over the channels, which is *significantly* faster for the channels first case. So, my question is: *why did you choose to place the channels on the last axis rather than the first?*
Hey Patrick, Good question! And one that I've thought about repeatedly. In my own work I usually "unwrap" the channels as the first step of my analysis, for the reasons you've outlined. All these conventions were in place when I got into image analysis so I can't comment on the history, but I can offer a few hypotheses: 0. Image storage formats group the channels together by default (e.g. TIFF). 1. Matlab uses this convention, where it actually makes computational sense since they use column-major array storage. 2. It makes sense from a notational perspective: image[i, j] -> color. 3. It actually is useful for certain functions, such as SLIC, in which all the channels are considered together. I'm guessing you are also interested in a follow-up question, namely, is this ever likely to change, to which I would say, probably not, as the convention is too embedded in the collective consciousness. In applications where each channel requires separate processing, storing the channels as separate grayscale files might make sense as a workaround. Juan. On Wed, Jul 16, 2014 at 11:18 AM, Patrick Snape <patricksnape@gmail.com> wrote:
Hi,
I'm one of the maintainers of a project that has some overlap with scikit-image. Recently, I've been working on optimising some of our slower code paths, and I came across an interesting result. In our package, we order our image data in the same way as in scikit-image:
image.shape = [height, width, channels]
Given that cPython runs on top of C, this means that the normal memory layout is generally row-major. Therefore, in the C-contiguous layout of the above memory pattern, it means that the rightmost channel indexes fastest. For example, when iterating over a vectorised image (stolen shamelessly from Wikipedia):
int A[2, 3, 4] = np.array([[[1,2,3,4], [5,6,7,8], [9,10,11,12]], [[13,14,15,16}, [17,18,19,20], [21,22,23,24]])
We walk that memory:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Practically, this means that we can most efficiently walk along the channels.
However! In my experience, especially with 2D images, you usually store most of your relationships across the image and leave the channels as independent of one another. A simple example of this, and the one I was considering, is calculating the gradient of a multi-channel image (so a 3 channel RGB image will return a 6-channel image that represents the x and y gradient over each channel R, G and B).
Now, by calculating the gradient on a [height, width, channels] image, I have to jump a lot further in memory to find my neighbours, because I'm also skipping over channels. This is actually very inefficient, and in my testing (implementing the gradient method in cython) this actually causes a huge bottleneck. For example, the more channels you have, the closer in performance my code comes to Numpy's pure Python gradient implementation. This is because both pieces of code are being hurt by having to work 'against' the memory layout.
A practical example of this can be seen here: http://nbviewer.ipython.org/gist/patricksnape/7b31c14c7103c2995956
As mentioned, I have some code I've been working on that implements the gradient directly in Cython and uses OpenMP to parallelize the gradient over the channels, which is *significantly* faster for the channels first case.
So, my question is: *why did you choose to place the channels on the last axis rather than the first?*
-- You received this message because you are subscribed to the Google Groups "scikit-image" group. To unsubscribe from this group and stop receiving emails from it, send an email to scikit-image+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
participants (2)
-
Juan Nunez-Iglesias
-
Patrick Snape