Hi all,

as a scientific image user I have been reading along this difficult thread.
Let me first pay my respect that these difficult and, by nature, opinionated (which is good!) discussions are being performed in such a civil manner!
As someone who is member in a technical committee for Python software myself, I know how hard this can be..

Now to the issue at hand, I was wondering if this could be tackled as it's done in Space/Tech engineering, with a requirements documents that all should agree on, from which maybe the one and only obvious solution will emerge?

I wanted to mention my personal requirements for working with an image library.
Please forgive me if all of this already happens in skimage, but it's been a while since I was using it:

First, and for me the most important: 
Pixel values are sacred and shall never be changed without letting the user know.

I'm almost sure that this is the case with skimage now, but in the early days I remember I was highly surprised, annoyed even, when some routine simply insisted that the input data needs to be so and so and the result will be this format, no matter what came in. It simply resulted in being less useful for me (no complaint, I know I could have done some PRs ;) ).

I will admit that us instrumentalists are completely ignorant of certain standards in proper image formats, we simply use them as co-located data containers.
This means they can be ANY format: 
* Integers (both signed and unsigned)
- with counts as high as the digitized signal required for determining the dynamic range, sometime negative because some weird amplifier randomly would suck off electrons, who knows what the engineers are cooking with ... ;)
* Floats, often representing physical values after the integer format version was calibrated, but with absolutely no sensible/reasonable way to force them into some kind of range. The pixel values represent physics values, they don't care that a float image shouldn't be larger than 1.0

but the fact is, ALL of these pixel values are measurements with a meaning and they absolutely need to be preserved.
This statement needs to be qualified though with "within reason", as obviously some "wanted" operation like a median filter to remove noise will change pixel values, but is indeed range preserving and the meaning of the data isn't lost.

I understand that certain algorithms require the incoming image to be in a certain format and range, and if no "standard" wrapper can be identified that could transform and back-transform into the same range, then the user should be pointed to workarounds, but not left alone simply with the error message that the format doesn't match the algorithm.

I for myself am lucky that I do not have a lot of code that I would need to change, so I wouldn't really mind any import name changes, so I think it might be much more of an discussion for the maintainers which way minimize a convolution of maintainer_effort with user_pain, but honesty, knowing how hard it is to find extra time for a passion volunteer-effort project, I'd almost always go for "least effort", because I think the community will come around (as can be seen with cv2 and other examples).

I just wanted to emphasize how important the pixel values can be for us, as they literally represent the bearer of the truth from outer space, so to speak, and any change of their values shall be done only under full consideration of the consequences.

My 2 opinionated cents.
As always, thanks so much for everybody's effort for this project, we soon will have a technical-committee-reviewed package of many of my planetary science tools for data retrieval and data reading coming out, so I'm kinda feeling now how much damned work it is to design tools for the "community"...

Best regards,

On Sun, Jul 25, 2021 at 11:40 AM Josh Warner <silvertrumpet999@gmail.com> wrote:
I'll be brief as my internet is currently down, replying from mobile.

Of these examples and similar, I would characterize them in a couple categories
1. Data range user errors - the user used (almost always an overly large) type for their actual data and they end up with an image which looks all black/gray/etc.
2. Signed data of course needs to include the symmetric range [-1, 1] as a generalization of unsigned workflow, which happens naturally since float64 is signed.
3. Overshoots/undershoots due to expected computational effects, as mentioned elsewhere in this thread; user may or may not want to retain these, and are uncommon.

These do represent a low level support burden - but since the story is predictable, presents the opportunity to guide users toward a FAQ or similar before filling a new Issue.  That would certainly be less disruptive than the solutions proposed!  

I would assert anyone working in this space NEEDS to understand their data and its representation or they will have serious problems.  It is so foundational that insulating them from the concept doesn't do them favors.

That said the workings and logic of dtype.py are somewhat opaque.  Could a featured, direct, high-yield document informing users about our conversion behavior and a FAQ serve users just as well as the heroic efforts suggested?


On Sat, Jul 24, 2021, 19:59 Juan Nunez-Iglesias <jni@fastmail.com> wrote:
I'm very glad to hear from you, Josh 😊, but I'm 100% convinced that removing the automatic rescaling is the right path forward. Stéfan, "floats between [0, 1]" is easy enough to explain, except when it isn't (signed filters), or when we automatically rescale int32s in [0, 255] to floats in [0, 2**(-31)], or uint16s in [0, 4095] to floats in [0, 2**(-4)], etc. I can't count the number of times I've had to point users to "Image data types and what they mean". Floats in [0, 1] is certainly not simpler to explain than "we use floats internally for computation, period." Yes, there is a chance that we'll now get users confused about uint8 overflow/underflow, but at least then we can teach them about fundamental computer science principles, rather than about how skimage does things "just so".

As Matthew pointed out, the user is best placed to know how to manage their data scales. When we do it automagically, we often mess up. And Stéfan, to steal from your approach, we can look to our values to guide our decision-making: "we don't do magic." Let's remove the last few places where we do.

Matthew, apologies for sounding callous to users — that is absolutely not my intent! Hence this email thread. The question when aiming for a new API is how to move the community forward without fracturing it. My suggestion of "upgrade pressure" was aimed at doing this, with the implicit assumption that *limited* short term pain would result in higher long-term gain — for all our users.

I'm certainly starting to be persuaded that skimage2 is indeed the best path forward, mainly so that we don't invalidate old Q&As and tutorials. We can perhaps do a combination, though:

skimage 0.19 is the last "real" release with the old API
skimage2 2.0 is the next real release
when skimage 2.0 is release, we release skimage 0.20, which is 0.19 with a warning that scikit-image is deprecated and no longer maintained, and point to the migration guide, and if you want to keep using the deprecated API, pin to 0.19 explicitly.

That probably satisfies my "migration pressure" requirement.


On Fri, 23 Jul 2021, at 8:29 PM, Stefan van der Walt wrote:
Hi Tom,

On Fri, Jul 23, 2021, at 17:57, Thomas Caswell wrote:

Where the issues tend to show up is if you have enough dynamic range that the small end is less than difference between adjacent representable numbers at the high end e.g.

In [5]: 1e16 == (1e16 + 1)
Out[5]: True

This issue would crop up if you had, e.g., uint64 images utilizing the full range.  We don't support uint64 images, and uint32 is OK still on this front if you use `float64` for calculations.

In some cases the scaling / unscaling does not work out the way you wish it would.  While it is possible that the issues we are having are related to what we are doing with the results, forcing to [0, 1] restricts you to ~15 orders of magnitude on the whole image which seems not ideal.  While it may not be common, that Matplotlib got those bug reports says we do have users with such extreme dynamic range in the community!

15 orders of magnitude is enormous!  Note that all our floating point operations internally currently happen with float64 anyway—and this is pretty much the best you can do here.

The other issue you mention is due to interpolation that sometimes goes outside the desired range; but this is an expected artifact of interpolation (which we typically have the `clip` flag for).

To be clear, I'm trying to get a the underlying issues here and identify them; not to dismiss your concerns!

Best regards,
scikit-image mailing list -- scikit-image@python.org
To unsubscribe send an email to scikit-image-leave@python.org
Member address: jni@fastmail.com

scikit-image mailing list -- scikit-image@python.org
To unsubscribe send an email to scikit-image-leave@python.org
Member address: silvertrumpet999@gmail.com
scikit-image mailing list -- scikit-image@python.org
To unsubscribe send an email to scikit-image-leave@python.org
Member address: kmichael.aye@gmail.com