
Am 26.10.2011 10:51, schrieb Neil Yager:
I was having conversation about data types with St�fan in the line comments of a PR, and I thought I should move it here so others can benefit from his explanations as well.
Being new to the project, I didn't appreciate the intricacies of data typing.
After working with images for some time. This still annoys me. And I don't know of a library having a good solution. OpenCV for example is a mess. I think this is a very important topic since it influences usability a lot.
For example, I was surprised to see that this raises a ValueError:
skimage.img_as_float(np.arange(9).reshape((3, 3))) The problem is that the default dtype of np.arange is int32, which isn't supported by skimage, so img_as_float doesn't know how to scale it to [0, 1]. Perhaps it is correct to fail, as it will force the user to consider the data type issue. However, it does seem like a reasonable/common thing to want to do.
We briefly discussed this issue on the list and Stefan thought it would be good to make the user think about what they want to achieve. I find this not completely satisfying but I could not come up with a better solution. I do not think that using an np.arange(n) is a reasonable/common to do by the way. What is the expected behavior? By definition, the output can be in any range. If you fix any range, either you'll get out of it for large n or you'll see nothing for small n. Maybe the most reasonable thing would be to expect that img_as_float(np.arange(n)) always returns something with minimum 0 and maximum 1. The only way to achieve that would be to determine the range of an int image by taking the max, each time you use it. This of course would lead to unexpected behavior in other places. So I'm not sure if it actually makes things better. We'd have to be careful.
A related, but different, issue is the following:
x = np.arange(9, dtype=np.uint8).reshape((3, 3)) x array([[0, 1, 2], [3, 4, 5], [6, 7, 8]], dtype=uint8) y = skimage.img_as_ubyte(x.astype(np.float32)) WARNING:dtype_converter:Possible precision loss, converting from float32 to uint8 y array([[ 0, 255, 254], [253, 252, 251], [250, 249, 248]], dtype=uint8)
The problem here is that the input to img_as_ubyte violates skimage's assumption that floating point images have the range [0, 1], leading to an unexpected result (at least for a beginner). There is a warning, but that's for a different problem. Should img_as_ubyte, img_as_float, etc. check and enforce ranges? Or raise warnings? Any thoughts? Maybe we can check whether the upper bound is satisfied. That
I think this is perfectly fine. You used "astype". That's evil! probably wouldn't hurt much if we convert any way. Also, we should stress in the (at the moment not really existing) user guide, that users should NEVER EVER use "astype" on an image, since that violates all our assumptions. Cheers, Andy

I do not think that using an np.arange(n) is a reasonable/common to do by the way. What is the expected behavior?
I've seen it used for demo/testing for quickly creating an array with a range of values (it is being used in a unit test). In the context of this discussion, it is just an example of a way a user may end up with an array of int32s without really thinking about it, thereby getting themselves into trouble.
Maybe we can check whether the upper bound is satisfied. That probably wouldn't hurt much if we convert any way.
I think that might be the best way.
Also, we should stress in the (at the moment not really existing) user guide, that users should NEVER EVER use "astype" on an image, since that violates all our assumptions.
I completely agree. However, the explicit use of "astype" is just one of many ways to find yourself in this situation. e.g.:
x = np.arange(9).reshape((3, 3)) + 1. y = skimage.img_as_ubyte(x) y array([[255, 254, 253], [252, 251, 250], [249, 248, 247]], dtype=uint8)
The core issue is to make sure that users know the assumed range for floats. Neil
participants (2)
-
Andreas Mueller
-
Neil Yager