I'll be brief as my internet is currently down, replying from mobile.

Of these examples and similar, I would characterize them in a couple categories
1. Data range user errors - the user used (almost always an overly large) type for their actual data and they end up with an image which looks all black/gray/etc.
2. Signed data of course needs to include the symmetric range [-1, 1] as a generalization of unsigned workflow, which happens naturally since float64 is signed.
3. Overshoots/undershoots due to expected computational effects, as mentioned elsewhere in this thread; user may or may not want to retain these, and are uncommon.

These do represent a low level support burden - but since the story is predictable, presents the opportunity to guide users toward a FAQ or similar before filling a new Issue.  That would certainly be less disruptive than the solutions proposed!  

I would assert anyone working in this space NEEDS to understand their data and its representation or they will have serious problems.  It is so foundational that insulating them from the concept doesn't do them favors.

That said the workings and logic of dtype.py are somewhat opaque.  Could a featured, direct, high-yield document informing users about our conversion behavior and a FAQ serve users just as well as the heroic efforts suggested?


On Sat, Jul 24, 2021, 19:59 Juan Nunez-Iglesias <jni@fastmail.com> wrote:
I'm very glad to hear from you, Josh 😊, but I'm 100% convinced that removing the automatic rescaling is the right path forward. Stéfan, "floats between [0, 1]" is easy enough to explain, except when it isn't (signed filters), or when we automatically rescale int32s in [0, 255] to floats in [0, 2**(-31)], or uint16s in [0, 4095] to floats in [0, 2**(-4)], etc. I can't count the number of times I've had to point users to "Image data types and what they mean". Floats in [0, 1] is certainly not simpler to explain than "we use floats internally for computation, period." Yes, there is a chance that we'll now get users confused about uint8 overflow/underflow, but at least then we can teach them about fundamental computer science principles, rather than about how skimage does things "just so".

As Matthew pointed out, the user is best placed to know how to manage their data scales. When we do it automagically, we often mess up. And Stéfan, to steal from your approach, we can look to our values to guide our decision-making: "we don't do magic." Let's remove the last few places where we do.

Matthew, apologies for sounding callous to users — that is absolutely not my intent! Hence this email thread. The question when aiming for a new API is how to move the community forward without fracturing it. My suggestion of "upgrade pressure" was aimed at doing this, with the implicit assumption that *limited* short term pain would result in higher long-term gain — for all our users.

I'm certainly starting to be persuaded that skimage2 is indeed the best path forward, mainly so that we don't invalidate old Q&As and tutorials. We can perhaps do a combination, though:

skimage 0.19 is the last "real" release with the old API
skimage2 2.0 is the next real release
when skimage 2.0 is release, we release skimage 0.20, which is 0.19 with a warning that scikit-image is deprecated and no longer maintained, and point to the migration guide, and if you want to keep using the deprecated API, pin to 0.19 explicitly.

That probably satisfies my "migration pressure" requirement.


On Fri, 23 Jul 2021, at 8:29 PM, Stefan van der Walt wrote:
Hi Tom,

On Fri, Jul 23, 2021, at 17:57, Thomas Caswell wrote:

Where the issues tend to show up is if you have enough dynamic range that the small end is less than difference between adjacent representable numbers at the high end e.g.

In [5]: 1e16 == (1e16 + 1)
Out[5]: True

This issue would crop up if you had, e.g., uint64 images utilizing the full range.  We don't support uint64 images, and uint32 is OK still on this front if you use `float64` for calculations.

In some cases the scaling / unscaling does not work out the way you wish it would.  While it is possible that the issues we are having are related to what we are doing with the results, forcing to [0, 1] restricts you to ~15 orders of magnitude on the whole image which seems not ideal.  While it may not be common, that Matplotlib got those bug reports says we do have users with such extreme dynamic range in the community!

15 orders of magnitude is enormous!  Note that all our floating point operations internally currently happen with float64 anyway—and this is pretty much the best you can do here.

The other issue you mention is due to interpolation that sometimes goes outside the desired range; but this is an expected artifact of interpolation (which we typically have the `clip` flag for).

To be clear, I'm trying to get a the underlying issues here and identify them; not to dismiss your concerns!

Best regards,
scikit-image mailing list -- scikit-image@python.org
To unsubscribe send an email to scikit-image-leave@python.org
Member address: jni@fastmail.com

scikit-image mailing list -- scikit-image@python.org
To unsubscribe send an email to scikit-image-leave@python.org
Member address: silvertrumpet999@gmail.com