Re: [scikit-image] data types

Thanks everyone for your comments. Ralf’s comment in particular gives me pause again about my magical proposal. Changing that behaviour could indeed be simply trading one class of common error for another, even more annoying class. Warnings *should* be the right answer, but the truth is that essentially no one reads them, *and* once you’ve figured out the issue, they are a nuisance. However perhaps we can make more of an effort to make our warnings more “full-bodied”, including plain english explanations and links to the relevant documentation page. @Tom, I think actually adding a 12-bit option to rescale intensity, (rather than currently having to define `in_range=(0, 4096)`) would be very useful. There’s some room for lots of utility functions, maybe even in their own top-level `skimage.data_types` submodule. Two other proposals: - put a big warning on the project homepage: “black images? out of range errors? See this page!” The page would be the data types page, to which we must add a “troubleshooting common data type errors” section. - add a poll on a GitHub issue or on the site: “Did you spend 2h debugging something related to the image data type and range? Please vote on what you think would have helped!” Then we can link to that every time someone stumbles on this. Juan. On 30 Mar 2018, 4:19 PM -0400, Thomas Caswell <tcaswell@gmail.com>, wrote:
Automatically picking bit-depth based on value seems dangerous, but a `guess_best_dtype(input_data: np.array) -> dtype` helper function would be useful.
Tom
On Fri, Mar 30, 2018 at 10:10 AM Gregory Lee <grlee77@gmail.com> wrote:
On Thu, Mar 29, 2018 at 1:46 PM, Juan Nunez-Iglesias <jni.soma@gmail.com> wrote:
I think maybe 50% of our bug reports/help requests have to do with image data types. Does anyone want to express an opinion about how we can fix things?
My humble (really) suggestions, *to start* (ie more needs to be done than this):
* If a 16-bit or higher image has no values above 4096 or below 0, treat the image as 12 bit. This is a very common image type for some reason.
One common source for 12-bit is the DICOM standard used by industry for medical imaging.
* If an integer image has no values above 255, treat it as an 8-bit image. This also happens a lot.
* If a floating point image has values outside [0, 1], don’t croak, just accept it. (This might have already happened?) If it has values only in [0, 1/255], and the user wants to convert to uint8, use the input range as the range.
I am in favor of accepting arbitrarily scaled floats unless the algorithm depends on values being within a particular range (not sure if we have many of these?). We do already allow unscaled floats in some places (e.g. compare_nrmse, etc), but it is not very consistent. For example, I recently noticed that denoise_wavelet enforces floats to be in [0, 1] (or [-1, 1]), but it would work equally well for unscaled data.
Some of these, especially the last one, may appear too magical, and in some ways I think they are, but honestly, given the frequency of problems that we get because of this, I think it’s time to suck it up and really work on doing what most of our users want most of the time. We don’t need to coddle the power users — they can be annoyed and micromanage the image range properly. To paraphrase a tweet I saw once (sorry, couldn’t find attribution): “edge cases should be used to check the design, not drive it.”
Applied to this case, we shouldn’t scale a uint32 image by 2**(-32) just because we can come up with a test case where this is useful.
Some of these problems would be alleviated by some consistent metadata conventions.
Juan.
_______________________________________________ scikit-image mailing list scikit-image@python.org https://mail.python.org/mailman/listinfo/scikit-image
scikit-image mailing list scikit-image@python.org https://mail.python.org/mailman/listinfo/scikit-image
scikit-image mailing list scikit-image@python.org https://mail.python.org/mailman/listinfo/scikit-image

On Sun, 01 Apr 2018 13:47:41 -0400, Juan Nunez-Iglesias wrote:
Ralf’s comment in particular gives me pause again about my magical proposal. Changing that behaviour could indeed be simply trading one class of common error for another, even more annoying class.
It's worse than that: it prevents one specific use-case (the one Ralf mentioned, e.g., where you have very low signal, but correctly so).
Warnings *should* be the right answer, but the truth is that essentially no one reads them, *and* once you’ve figured out the issue, they are a nuisance. However perhaps we can make more of an effort to make our warnings more “full-bodied”, including plain english explanations and links to the relevant documentation page.
I am strongly in favor of more expansive warning and error messages.
- put a big warning on the project homepage: “black images? out of range errors? See this page!” The page would be the data types page, to which we must add a “troubleshooting common data type errors” section.
An FAQ section in the docs, linked to from the front page? Best regards Stéfan

On Mon, Apr 2, 2018, at 6:59 AM, Stefan van der Walt wrote:
On Sun, 01 Apr 2018 13:47:41 -0400, Juan Nunez-Iglesias wrote:
Ralf’s comment in particular gives me pause again about my magical proposal. Changing that behaviour could indeed be simply trading one class of common error for another, even more annoying class.
It's worse than that: it prevents one specific use-case (the one Ralf mentioned, e.g., where you have very low signal, but correctly so).
No: we are arguing about default behaviour. I would always envision having the option to turn this off.
I am strongly in favor of more expansive warning and error messages.
Great!
- put a big warning on the project homepage: “black images? out of range errors? See this page!” The page would be the data types page, to which we must add a “troubleshooting common data type errors” section.
An FAQ section in the docs, linked to from the front page?
Yes! Stéfan, btw, one last Q: what do you think about my suggestions for the (u)int32 and (u)int64 image types? I feel like assuming a range up to 2**32 when converting is *never* useful.
participants (3)
-
Juan Nunez-Iglesias
-
Juan Nunez-Iglesias
-
Stefan van der Walt