Honestly, since you have to store your images somehow, and most file formats store the pixels together, you are going to have to pay the cost of unwrapping the channels at some point. If you are doing a lot of manipulations in sequence, it makes sense to do it early on in your pipeline — but at that point it might make the most sense to unwrap the channels altogether into separate grayscale images, getting both speed and compatibility, at the cost of perhaps some readability.
Another option would be to just use column-major order for all your images. Although it's not the default, Numpy provides for this (`order='F'`). I think all of our functions and Numpy's would work out of the box, and you'd get your performance boost! Come to think of it, that might be the magic you were looking for! =P