On Thu, Jul 22, 2021, at 15:12, Josh Warner wrote:
It's also worth considering that there is a substantial corpus of scikit-image teaching material out there. The majority we do not control, so cannot be updated or edited. The first hits on YouTube for tutorials are not the most recent, but older ones with lots of views. In virtually all cases, we tell users "anything in, anything out" and they will continue to hear and read this regardless of how strongly we might attempt to message around the change. As a result I expect the ongoing support burden will actually be worse with this change, than the low level support burden we've seen for years due to people not understanding datatype conversions.
So, I'll directly ask the question we're dancing around: *Is it worth making preserve_range the default?* As someone who would benefit from there changes, I am honestly no longer convinced it is. The workarounds for this problem are trivial from my standpoint as a user who does actually care about my data range, whereas the consequences of changing it at the package level are substantial and insidious - if not outright dangerous.
Josh makes a good point, and perhaps we should take a step back before we get too carried away. From this discussion, there are just about as many opinions on how to do a transition as there are participants. We do not have consensus. I think one reason is that we have not taken the time to carefully write up the various categories of users, their needs, and how this would impact them. We need to compare code snippets, to see how APIs would look before and after. But, before we go there, Josh's comment really made me wonder: why are we so convinced that the current model is inherently flawed? Imagine, for example, we simplified the existing input model and said: input is always floats ranged [0, 1], output is always floats ranged [0, 1]. In a very few select cases it will have memory implications (CLAHE). And, yes, it's a tad annoying for people with, say, temperature data. But not much: scale = image.max() image = image / scale out skimage.some.func(image) out = out * scale We can then drop `preserve_range`. The "[0, 1] float in [0, 1] float out" data model is *trivial* to explain, and there are no surprises. If we error on other input, it will break some older scripts, but we can be descriptive. Compare this to some of the more extreme changes we've discussed so far. Writing utility functions to make common tasks easier is a lot more straightforward than forcing everyone to upgrade their scripts. Now, sure, there is a philosophical question too: should we forever be beholder to API decisions of the past? Don't we want a mechanism to move into a different direction eventually? Perhaps, but then I would argue as in the first paragraph: that we need to study carefully exactly who the users are we have in mind, and what their needs are. We may even want to do a survey to see how prevalent their needs are. I.e., we would need a much more detailed SKIP jointly written by the developers / community. So, let's take a careful look at Josh's suggestion and ask ourselves whether it is absolutely impossible to find a way out of this without silent & implicit API breakage. Best regards, Stéfan