Request for comments: plans for breaking changes in scikit-image 1.0
Dear skimagers, We are aiming to release scikit-image 1.0 near the end of the year. We are, however, planning to make a number of breaking changes in the API that will affect downstream libraries. We have published a proposal for how we plan to do this at https://bit.ly/skip-3. The gist of it is: - we'll release 0.19 in the coming weeks. - we'll release 0.20 immediately after, which will be exactly the same but with a warning to pin scikit-image to `<0.20` (for those that want to stay in 0.x land indefinitely) or `!=0.20.*` (for those that want to be "on the ball" when 1.0 is released and update their code as soon as possible). - we'll publish a transition guide along with 1.0rc0, and maintain 0.19.x with bug fixes for another year to give users time to transition. The document describes alternative approaches ("change the wheels on the bus while still driving it" or "make a new bus with a new name") and why the core team ultimately chose to promote the current approach. Nonetheless, scikit-image is committed to being a community-led project, so we are still gathering feedback and can make substantive modifications to the plan going forward. Please don't hesitate to voice your concerns so we can make the best choice for our entire community going forward! Thank you, Juan.
Hi, On Mon, Jul 19, 2021 at 5:34 AM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
Dear skimagers,
We are aiming to release scikit-image 1.0 near the end of the year. We are, however, planning to make a number of breaking changes in the API that will affect downstream libraries. We have published a proposal for how we plan to do this at https://bit.ly/skip-3. The gist of it is:
- we'll release 0.19 in the coming weeks. - we'll release 0.20 immediately after, which will be exactly the same but with a warning to pin scikit-image to `<0.20` (for those that want to stay in 0.x land indefinitely) or `!=0.20.*` (for those that want to be "on the ball" when 1.0 is released and update their code as soon as possible). - we'll publish a transition guide along with 1.0rc0, and maintain 0.19.x with bug fixes for another year to give users time to transition.
Please do give the appropriate weight to my remarks, given my tiny contributions to scikit-image, but I was rather scared by reading this suggestion. I can see that you do need to change the API - and that the two realistic options are: * Make a breaking 1.0 release * Make a new package e.g. skimage2 or similar. I'm afraid I wasn't completely sure whether the 1.0 option would result in breaking what I call the Konrad Hinsen rule for scientific software: """ Under (virtually) no circumstances should new versions of a scientific package silently give substantially different results for the same function / method call from a previous version of the package. """ The idea there, is that lots of scientific software is in the form of packages or scripts, that are not well maintained, but do often get picked up and re-used, if only to replicate results. They will very rarely specify exact package versions. It is a very serious problem if a later version of a package actually does something substantially different with the same function or method call than it did when the script was written. Fixing clear bugs and changes in algorithm implementation are fine - the results need not be absolutely identical, only compatible. It is also fine to raise an error, for example for expired deprecations. In this case the person using the script has a warning, and can investigate. The disaster is if they don't know that the result has changed. So - do the changes all raise errors for the previous, now expired API? Or do they break the Hinsen rule? Cheers, Matthew
Hi, On Mon, Jul 19, 2021 at 1:31 PM Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Mon, Jul 19, 2021 at 5:34 AM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
Dear skimagers,
We are aiming to release scikit-image 1.0 near the end of the year. We are, however, planning to make a number of breaking changes in the API that will affect downstream libraries. We have published a proposal for how we plan to do this at https://bit.ly/skip-3. The gist of it is:
- we'll release 0.19 in the coming weeks. - we'll release 0.20 immediately after, which will be exactly the same but with a warning to pin scikit-image to `<0.20` (for those that want to stay in 0.x land indefinitely) or `!=0.20.*` (for those that want to be "on the ball" when 1.0 is released and update their code as soon as possible). - we'll publish a transition guide along with 1.0rc0, and maintain 0.19.x with bug fixes for another year to give users time to transition.
Please do give the appropriate weight to my remarks, given my tiny contributions to scikit-image, but I was rather scared by reading this suggestion.
I can see that you do need to change the API - and that the two realistic options are:
* Make a breaking 1.0 release * Make a new package e.g. skimage2 or similar.
I'm afraid I wasn't completely sure whether the 1.0 option would result in breaking what I call the Konrad Hinsen rule for scientific software:
""" Under (virtually) no circumstances should new versions of a scientific package silently give substantially different results for the same function / method call from a previous version of the package. """
The idea there, is that lots of scientific software is in the form of packages or scripts, that are not well maintained, but do often get picked up and re-used, if only to replicate results. They will very rarely specify exact package versions. It is a very serious problem if a later version of a package actually does something substantially different with the same function or method call than it did when the script was written.
Fixing clear bugs and changes in algorithm implementation are fine - the results need not be absolutely identical, only compatible. It is also fine to raise an error, for example for expired deprecations. In this case the person using the script has a warning, and can investigate. The disaster is if they don't know that the result has changed.
So - do the changes all raise errors for the previous, now expired API? Or do they break the Hinsen rule?
The other thing that occurred to me is the same thing that I am sure occurred to y'all - that is the experience of the Python 2 / Python 3 transition. I think the basic message was that you can do big changes like that, but you have to support your users maintaining code that will work with old and new versions, for a fairly long time - otherwise you'll risk leaving a lot of developers on the old version, waiting until the new version has been around for long enough that it has completely replaced the old. How practical will that be - supporting both scikit-image versions <=0.18 and >=1.0, within the same library? Cheers, Matthew
Hi Juan, Thank you for writing up the SKIP; you clearly put a lot of thought and effort into it. As I read the SKIP, and then reflecting on Matthew's email, I again started feeling very uncomfortable with what we are proposing to do here. My gut feel has always been that the magical pinning solution is bound to cause trouble. I just can't shake that it is going to cause a lot of headaches and confusion. More importantly, thinking about what Matthew said and then looking at the list of silent API changes, there really is only one issue we need to address: data conversion. All the other API changes can be handled through a slightly more laborious process. The data conversion issue is thorny. I know for a fact that a bunch of my code will break with proposed skimage 1.0, because the assumption of 0-255 uint8 meaning 0-1 float is so deeply baked in. No matter how we handle it, it's going to be messy; but doing it silently, even if widely advertised, can be very disruptive. Perhaps we should reconsider biting the bullet and going through the slower, trusted deprecation process? It is true that we are short on person-hours to do the work, but if we estrange part of our user base increasing volunteer time will become even harder. I'm sorry; I know we've had these discussions over and over, and that it is frustrating to have someone come on board and then fall off again. I keep wanting to be OK with a more radical move, but I am just not that confident we can pull it off. Best regards, Stéfan On Sun, Jul 18, 2021, at 21:32, Juan Nunez-Iglesias wrote:
Dear skimagers,
We are aiming to release scikit-image 1.0 near the end of the year. We are, however, planning to make a number of breaking changes in the API that will affect downstream libraries. We have published a proposal for how we plan to do this at https://bit.ly/skip-3. The gist of it is:
- we'll release 0.19 in the coming weeks. - we'll release 0.20 immediately after, which will be exactly the same but with a warning to pin scikit-image to `<0.20` (for those that want to stay in 0.x land indefinitely) or `!=0.20.*` (for those that want to be "on the ball" when 1.0 is released and update their code as soon as possible). - we'll publish a transition guide along with 1.0rc0, and maintain 0.19.x with bug fixes for another year to give users time to transition.
The document describes alternative approaches ("change the wheels on the bus while still driving it" or "make a new bus with a new name") and why the core team ultimately chose to promote the current approach. Nonetheless, scikit-image is committed to being a community-led project, so we are still gathering feedback and can make substantive modifications to the plan going forward. Please don't hesitate to voice your concerns so we can make the best choice for our entire community going forward!
Thank you,
Juan.
Hi Stéfan, Matthew, Thanks Matthew for weighing in. Stéfan, no worries about having second thoughts — the purpose of the SKIP process is to account for these! But, my responses below — I’m still in favour of the SKIP. There are two main concerns: breaking the Hinsen rule, and supporting users through the transition. Regarding the 1st:
Perhaps we should reconsider biting the bullet and going through the slower, trusted deprecation process? It is true that we are short on person-hours to do the work, but if we estrange part of our user base increasing volunteer time will become even harder.
I think this is actually *counterproductive* in the context of the Konrad Hinsen rule: if we change slowly with a deprecation path of adding `preserve_range=False` whose default value migrates to True over two versions, then *you end up breaking the Hinsen rule*, just over a slightly longer timescale. And you don’t get your compatibility layer — earlier versions of the library don’t have `preserve_range` so you can’t easily support the later version and the earlier version simultaneously. In contrast, if we go with the 1.0 approach and bundle a large number of breaking API changes, the odds of breaking the Hinsen rule are much smaller: almost all skimage 0.x scripts will fail with skimage 1.0. In my opinion, the Hinsen rule argues in favour of either adopting the SKIP, or going with the skimage1 approach (new package name). It absolutely does not argue in favour of the slow deprecation path: we’ve probably broken the Hinsen rule more than a few times with that approach. For example, when we switched some regionprops from xy to rc coordinates. Code that measured object orientation in 0.14 will yield different results in 0.18 with no warning. (There is warning in the intervening versions, but that doesn’t help users now.) Regarding the transition:
How practical will that be - supporting both scikit-image versions <=0.18 and >=1.0, within the same library?
This one is bigger: currently, that’s expected to be more or less impossible, or extremely painful. The SKIP bets on most libraries/users making a clean transition. Indeed that didn’t work out well for Python 2/3, but I think this transition is easier by at least two orders of magnitude. The Python 2/3 transition required rebuilding all the libraries in PyPI, including, for compiled libraries, subtle issues with the ABI that I don’t understand to this day. The scikit-image transition will involve: - find/replace for certain function/attribute names - adding a call to `rescale_to_float` (or similar) before some function calls - transposing images in some calls to `transform.warp` I think these can be adequately addressed by a simple migration guide. The added complication is downstream packages of downstream packages. If you have a “dependency diamond” where A depends on B and C, both of which depend on scikit-image, A is stuck on the minimal version that B and C depend on, which can dramatically slow down the transition. This was brought up by Mark during the API meetings. I will add an explicit note to the SKIP because it’s a very important point. Here again I think scikit-image’s job is two orders of magnitude easier than Python’s: to a first approximation, I think scikit-image is high-level enough that this situation will be quite rare, and certainly won’t propagate many levels up like it did with Python. I also hope that most people will not feel compelled to support 0.19 and 1.0 simultaneously, as people did with Python. So, based on the above, in my opinion, the only candidates are: - accept the SKIP - create a new library - do nothing and of those three, the SKIP is the most attractive. The new library has even higher potential of fragmenting the community imho. Having said this, I would definitely still entertain `skimage.v0.old_module`. Alex was most reluctant about this, and I totally understand his reluctance, but it does fix the simultaneous dependency thing. To me it’s still much better than the slow deprecation path. Juan.
On 20 Jul 2021, at 3:20 am, Stefan van der Walt <stefanv@berkeley.edu> wrote:
Hi Juan,
Thank you for writing up the SKIP; you clearly put a lot of thought and effort into it.
As I read the SKIP, and then reflecting on Matthew's email, I again started feeling very uncomfortable with what we are proposing to do here. My gut feel has always been that the magical pinning solution is bound to cause trouble. I just can't shake that it is going to cause a lot of headaches and confusion.
More importantly, thinking about what Matthew said and then looking at the list of silent API changes, there really is only one issue we need to address: data conversion. All the other API changes can be handled through a slightly more laborious process.
The data conversion issue is thorny. I know for a fact that a bunch of my code will break with proposed skimage 1.0, because the assumption of 0-255 uint8 meaning 0-1 float is so deeply baked in. No matter how we handle it, it's going to be messy; but doing it silently, even if widely advertised, can be very disruptive.
Perhaps we should reconsider biting the bullet and going through the slower, trusted deprecation process? It is true that we are short on person-hours to do the work, but if we estrange part of our user base increasing volunteer time will become even harder.
I'm sorry; I know we've had these discussions over and over, and that it is frustrating to have someone come on board and then fall off again. I keep wanting to be OK with a more radical move, but I am just not that confident we can pull it off.
Best regards, Stéfan
On Sun, Jul 18, 2021, at 21:32, Juan Nunez-Iglesias wrote:
Dear skimagers,
We are aiming to release scikit-image 1.0 near the end of the year. We are, however, planning to make a number of breaking changes in the API that will affect downstream libraries. We have published a proposal for how we plan to do this at https://bit.ly/skip-3. The gist of it is:
- we'll release 0.19 in the coming weeks. - we'll release 0.20 immediately after, which will be exactly the same but with a warning to pin scikit-image to `<0.20` (for those that want to stay in 0.x land indefinitely) or `!=0.20.*` (for those that want to be "on the ball" when 1.0 is released and update their code as soon as possible). - we'll publish a transition guide along with 1.0rc0, and maintain 0.19.x with bug fixes for another year to give users time to transition.
The document describes alternative approaches ("change the wheels on the bus while still driving it" or "make a new bus with a new name") and why the core team ultimately chose to promote the current approach. Nonetheless, scikit-image is committed to being a community-led project, so we are still gathering feedback and can make substantive modifications to the plan going forward. Please don't hesitate to voice your concerns so we can make the best choice for our entire community going forward!
Thank you,
Juan.
scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: jni@fastmail.com
Apologies for speaking out of turn - I am not a developer but a user and one who might also have codes that might break subtly. Allow me to make a suggestion: consider changing the name of the top-level namespace from `skimage` to `scimage`. Such a name change would mean codes using `import skimage` would not break, but the package could be deprecated and eventually abandoned. This would make the new API and functionality be "opt-in", and so a deliberate act. It would be nearly painless, but is also a clear change that the downstream user would have to do. That is, if they discover a subtle bug in their code relate to these changes, git bisect could show them that it happened when they switched from `skimage` to `scimage` and so encourage them go re-read the FAQ and release notes. Since `scimage` is still short for "scikit-image", nothing else needs rebranding. You might even say (tongue-in-cheek, of course), "we decided to spell science with a 'c'". Feel free to ignore, Thanks for considering, Cheers, On Mon, Jul 19, 2021 at 8:21 PM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
Hi Stéfan, Matthew,
Thanks Matthew for weighing in. Stéfan, no worries about having second thoughts — the purpose of the SKIP process is to account for these! But, my responses below — I’m still in favour of the SKIP.
There are two main concerns: breaking the Hinsen rule, and supporting users through the transition. Regarding the 1st:
Perhaps we should reconsider biting the bullet and going through the slower, trusted deprecation process? It is true that we are short on person-hours to do the work, but if we estrange part of our user base increasing volunteer time will become even harder.
I think this is actually *counterproductive* in the context of the Konrad Hinsen rule: if we change slowly with a deprecation path of adding `preserve_range=False` whose default value migrates to True over two versions, then *you end up breaking the Hinsen rule*, just over a slightly longer timescale.
And you don’t get your compatibility layer — earlier versions of the library don’t have `preserve_range` so you can’t easily support the later version and the earlier version simultaneously.
In contrast, if we go with the 1.0 approach and bundle a large number of breaking API changes, the odds of breaking the Hinsen rule are much smaller: almost all skimage 0.x scripts will fail with skimage 1.0.
In my opinion, the Hinsen rule argues in favour of either adopting the SKIP, or going with the skimage1 approach (new package name). It absolutely does not argue in favour of the slow deprecation path: we’ve probably broken the Hinsen rule more than a few times with that approach. For example, when we switched some regionprops from xy to rc coordinates. Code that measured object orientation in 0.14 will yield different results in 0.18 with no warning. (There is warning in the intervening versions, but that doesn’t help users now.)
Regarding the transition:
How practical will that be - supporting both scikit-image versions <=0.18 and >=1.0, within the same library?
This one is bigger: currently, that’s expected to be more or less impossible, or extremely painful. The SKIP bets on most libraries/users making a clean transition. Indeed that didn’t work out well for Python 2/3, but I think this transition is easier by at least two orders of magnitude. The Python 2/3 transition required rebuilding all the libraries in PyPI, including, for compiled libraries, subtle issues with the ABI that I don’t understand to this day. The scikit-image transition will involve:
- find/replace for certain function/attribute names - adding a call to `rescale_to_float` (or similar) before some function calls - transposing images in some calls to `transform.warp`
I think these can be adequately addressed by a simple migration guide.
The added complication is downstream packages of downstream packages. If you have a “dependency diamond” where A depends on B and C, both of which depend on scikit-image, A is stuck on the minimal version that B and C depend on, which can dramatically slow down the transition. This was brought up by Mark during the API meetings. I will add an explicit note to the SKIP because it’s a very important point. Here again I think scikit-image’s job is two orders of magnitude easier than Python’s: to a first approximation, I think scikit-image is high-level enough that this situation will be quite rare, and certainly won’t propagate many levels up like it did with Python. I also hope that most people will not feel compelled to support 0.19 and 1.0 simultaneously, as people did with Python.
So, based on the above, in my opinion, the only candidates are:
- accept the SKIP - create a new library - do nothing
and of those three, the SKIP is the most attractive. The new library has even higher potential of fragmenting the community imho.
Having said this, I would definitely still entertain `skimage.v0.old_module`. Alex was most reluctant about this, and I totally understand his reluctance, but it does fix the simultaneous dependency thing. To me it’s still much better than the slow deprecation path.
Juan.
On 20 Jul 2021, at 3:20 am, Stefan van der Walt <stefanv@berkeley.edu> wrote:
Hi Juan,
Thank you for writing up the SKIP; you clearly put a lot of thought and effort into it.
As I read the SKIP, and then reflecting on Matthew's email, I again started feeling very uncomfortable with what we are proposing to do here. My gut feel has always been that the magical pinning solution is bound to cause trouble. I just can't shake that it is going to cause a lot of headaches and confusion.
More importantly, thinking about what Matthew said and then looking at the list of silent API changes, there really is only one issue we need to address: data conversion. All the other API changes can be handled through a slightly more laborious process.
The data conversion issue is thorny. I know for a fact that a bunch of my code will break with proposed skimage 1.0, because the assumption of 0-255 uint8 meaning 0-1 float is so deeply baked in. No matter how we handle it, it's going to be messy; but doing it silently, even if widely advertised, can be very disruptive.
Perhaps we should reconsider biting the bullet and going through the slower, trusted deprecation process? It is true that we are short on person-hours to do the work, but if we estrange part of our user base increasing volunteer time will become even harder.
I'm sorry; I know we've had these discussions over and over, and that it is frustrating to have someone come on board and then fall off again. I keep wanting to be OK with a more radical move, but I am just not that confident we can pull it off.
Best regards, Stéfan
On Sun, Jul 18, 2021, at 21:32, Juan Nunez-Iglesias wrote:
Dear skimagers,
We are aiming to release scikit-image 1.0 near the end of the year. We are, however, planning to make a number of breaking changes in the API that will affect downstream libraries. We have published a proposal for how we plan to do this at https://bit.ly/skip-3. The gist of it is:
- we'll release 0.19 in the coming weeks. - we'll release 0.20 immediately after, which will be exactly the same but with a warning to pin scikit-image to `<0.20` (for those that want to stay in 0.x land indefinitely) or `!=0.20.*` (for those that want to be "on the ball" when 1.0 is released and update their code as soon as possible). - we'll publish a transition guide along with 1.0rc0, and maintain 0.19.x with bug fixes for another year to give users time to transition.
The document describes alternative approaches ("change the wheels on the bus while still driving it" or "make a new bus with a new name") and why the core team ultimately chose to promote the current approach. Nonetheless, scikit-image is committed to being a community-led project, so we are still gathering feedback and can make substantive modifications to the plan going forward. Please don't hesitate to voice your concerns so we can make the best choice for our entire community going forward!
Thank you,
Juan.
scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: jni@fastmail.com
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: matt.newville@gmail.com
-- --Matt Newville <newville at cars.uchicago.edu> 630-327-7411
Hi Matt, Thank you for emailing and not out of turn at all! Indeed we are advertising widely because we *really* want to hear from users like you! Regarding renaming the package, while the new name is not a huge issue (we’d probably go with skimage1 or skimage2, in that case; BeautifulSoup has been importing from bs4 for years!), in our earlier discussions we felt that this risks fragmenting the community for a much longer period: people would never know to install scikit-image1 instead of scikit-image. *If* you’re suggesting changing the import but not the package name, well, that’s definitely an option, but one that I think can be equivalently achieved with `skimage.v0` together with a change of ~all the top level imports. ie *no* scikit-image 0.19 code would ever work silently in 1.0. That might be achievable by keeping the skimage 0.x test suite around and marking it as xfail. =) Or we just go with skimage.v1 and warn that all old skimage code should move to skimage.v0. An issue with moving the top level import but not the package name is that if the current import never changes and never stops working, then users will simply stay on 0.19 indefinitely and expect it to be supported indefinitely, which is the same as just making a new library. In other words, it removes *all* pressure to upgrade to the new API. This sounds great on the one hand but also like a recipe for an endlessly fragmented community on the other. And anyway, it can be achieved on the user side by pinning the scikit-image dependency to <=0.19.* Anyway, thank you for taking part in the discussion! We are definitely considering the top-level-import name change option seriously. At any rate, as mentioned in this thread and in the SKIP, the current deprecation policy in scikit-image means that you should never use scikit-image (or NumPy or SciPy) without an upper version pin, such as <0.21. If you do that, you’ll be golden no matter what happens. =) See this issue for more discussion: https://github.com/scipy/scipy/pull/12862 Thanks, Juan.
On 20 Jul 2021, at 12:36 pm, Matt Newville <newville@cars.uchicago.edu> wrote:
Apologies for speaking out of turn - I am not a developer but a user and one who might also have codes that might break subtly. Allow me to make a suggestion: consider changing the name of the top-level namespace from `skimage` to `scimage`.
Such a name change would mean codes using `import skimage` would not break, but the package could be deprecated and eventually abandoned. This would make the new API and functionality be "opt-in", and so a deliberate act. It would be nearly painless, but is also a clear change that the downstream user would have to do. That is, if they discover a subtle bug in their code relate to these changes, git bisect could show them that it happened when they switched from `skimage` to `scimage` and so encourage them go re-read the FAQ and release notes.
Since `scimage` is still short for "scikit-image", nothing else needs rebranding. You might even say (tongue-in-cheek, of course), "we decided to spell science with a 'c'".
Feel free to ignore, Thanks for considering, Cheers,
On Mon, Jul 19, 2021 at 8:21 PM Juan Nunez-Iglesias <jni@fastmail.com <mailto:jni@fastmail.com>> wrote: Hi Stéfan, Matthew,
Thanks Matthew for weighing in. Stéfan, no worries about having second thoughts — the purpose of the SKIP process is to account for these! But, my responses below — I’m still in favour of the SKIP.
There are two main concerns: breaking the Hinsen rule, and supporting users through the transition. Regarding the 1st:
Perhaps we should reconsider biting the bullet and going through the slower, trusted deprecation process? It is true that we are short on person-hours to do the work, but if we estrange part of our user base increasing volunteer time will become even harder.
I think this is actually *counterproductive* in the context of the Konrad Hinsen rule: if we change slowly with a deprecation path of adding `preserve_range=False` whose default value migrates to True over two versions, then *you end up breaking the Hinsen rule*, just over a slightly longer timescale.
And you don’t get your compatibility layer — earlier versions of the library don’t have `preserve_range` so you can’t easily support the later version and the earlier version simultaneously.
In contrast, if we go with the 1.0 approach and bundle a large number of breaking API changes, the odds of breaking the Hinsen rule are much smaller: almost all skimage 0.x scripts will fail with skimage 1.0.
In my opinion, the Hinsen rule argues in favour of either adopting the SKIP, or going with the skimage1 approach (new package name). It absolutely does not argue in favour of the slow deprecation path: we’ve probably broken the Hinsen rule more than a few times with that approach. For example, when we switched some regionprops from xy to rc coordinates. Code that measured object orientation in 0.14 will yield different results in 0.18 with no warning. (There is warning in the intervening versions, but that doesn’t help users now.)
Regarding the transition:
How practical will that be - supporting both scikit-image versions <=0.18 and >=1.0, within the same library?
This one is bigger: currently, that’s expected to be more or less impossible, or extremely painful. The SKIP bets on most libraries/users making a clean transition. Indeed that didn’t work out well for Python 2/3, but I think this transition is easier by at least two orders of magnitude. The Python 2/3 transition required rebuilding all the libraries in PyPI, including, for compiled libraries, subtle issues with the ABI that I don’t understand to this day. The scikit-image transition will involve:
- find/replace for certain function/attribute names - adding a call to `rescale_to_float` (or similar) before some function calls - transposing images in some calls to `transform.warp`
I think these can be adequately addressed by a simple migration guide.
The added complication is downstream packages of downstream packages. If you have a “dependency diamond” where A depends on B and C, both of which depend on scikit-image, A is stuck on the minimal version that B and C depend on, which can dramatically slow down the transition. This was brought up by Mark during the API meetings. I will add an explicit note to the SKIP because it’s a very important point. Here again I think scikit-image’s job is two orders of magnitude easier than Python’s: to a first approximation, I think scikit-image is high-level enough that this situation will be quite rare, and certainly won’t propagate many levels up like it did with Python. I also hope that most people will not feel compelled to support 0.19 and 1.0 simultaneously, as people did with Python.
So, based on the above, in my opinion, the only candidates are:
- accept the SKIP - create a new library - do nothing
and of those three, the SKIP is the most attractive. The new library has even higher potential of fragmenting the community imho.
Having said this, I would definitely still entertain `skimage.v0.old_module`. Alex was most reluctant about this, and I totally understand his reluctance, but it does fix the simultaneous dependency thing. To me it’s still much better than the slow deprecation path.
Juan.
On 20 Jul 2021, at 3:20 am, Stefan van der Walt <stefanv@berkeley.edu <mailto:stefanv@berkeley.edu>> wrote:
Hi Juan,
Thank you for writing up the SKIP; you clearly put a lot of thought and effort into it.
As I read the SKIP, and then reflecting on Matthew's email, I again started feeling very uncomfortable with what we are proposing to do here. My gut feel has always been that the magical pinning solution is bound to cause trouble. I just can't shake that it is going to cause a lot of headaches and confusion.
More importantly, thinking about what Matthew said and then looking at the list of silent API changes, there really is only one issue we need to address: data conversion. All the other API changes can be handled through a slightly more laborious process.
The data conversion issue is thorny. I know for a fact that a bunch of my code will break with proposed skimage 1.0, because the assumption of 0-255 uint8 meaning 0-1 float is so deeply baked in. No matter how we handle it, it's going to be messy; but doing it silently, even if widely advertised, can be very disruptive.
Perhaps we should reconsider biting the bullet and going through the slower, trusted deprecation process? It is true that we are short on person-hours to do the work, but if we estrange part of our user base increasing volunteer time will become even harder.
I'm sorry; I know we've had these discussions over and over, and that it is frustrating to have someone come on board and then fall off again. I keep wanting to be OK with a more radical move, but I am just not that confident we can pull it off.
Best regards, Stéfan
On Sun, Jul 18, 2021, at 21:32, Juan Nunez-Iglesias wrote:
Dear skimagers,
We are aiming to release scikit-image 1.0 near the end of the year. We are, however, planning to make a number of breaking changes in the API that will affect downstream libraries. We have published a proposal for how we plan to do this at https://bit.ly/skip-3 <https://bit.ly/skip-3>. The gist of it is:
- we'll release 0.19 in the coming weeks. - we'll release 0.20 immediately after, which will be exactly the same but with a warning to pin scikit-image to `<0.20` (for those that want to stay in 0.x land indefinitely) or `!=0.20.*` (for those that want to be "on the ball" when 1.0 is released and update their code as soon as possible). - we'll publish a transition guide along with 1.0rc0, and maintain 0.19.x with bug fixes for another year to give users time to transition.
The document describes alternative approaches ("change the wheels on the bus while still driving it" or "make a new bus with a new name") and why the core team ultimately chose to promote the current approach. Nonetheless, scikit-image is committed to being a community-led project, so we are still gathering feedback and can make substantive modifications to the plan going forward. Please don't hesitate to voice your concerns so we can make the best choice for our entire community going forward!
Thank you,
Juan.
scikit-image mailing list -- scikit-image@python.org <mailto:scikit-image@python.org> To unsubscribe send an email to scikit-image-leave@python.org <mailto:scikit-image-leave@python.org> https://mail.python.org/mailman3/lists/scikit-image.python.org/ <https://mail.python.org/mailman3/lists/scikit-image.python.org/> Member address: jni@fastmail.com <mailto:jni@fastmail.com>
_______________________________________________ scikit-image mailing list -- scikit-image@python.org <mailto:scikit-image@python.org> To unsubscribe send an email to scikit-image-leave@python.org <mailto:scikit-image-leave@python.org> https://mail.python.org/mailman3/lists/scikit-image.python.org/ <https://mail.python.org/mailman3/lists/scikit-image.python.org/> Member address: matt.newville@gmail.com <mailto:matt.newville@gmail.com>
-- --Matt Newville <newville at cars.uchicago.edu <http://cars.uchicago.edu/>> 630-327-7411 _______________________________________________ scikit-image mailing list -- scikit-image@python.org <mailto:scikit-image@python.org> To unsubscribe send an email to scikit-image-leave@python.org <mailto:scikit-image-leave@python.org> https://mail.python.org/mailman3/lists/scikit-image.python.org/ <https://mail.python.org/mailman3/lists/scikit-image.python.org/> Member address: jni@fastmail.com <mailto:jni@fastmail.com>
Hi, On Tue, Jul 20, 2021 at 2:22 AM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
Hi Stéfan, Matthew,
Thanks Matthew for weighing in. Stéfan, no worries about having second thoughts — the purpose of the SKIP process is to account for these! But, my responses below — I’m still in favour of the SKIP.
There are two main concerns: breaking the Hinsen rule, and supporting users through the transition. Regarding the 1st:
Perhaps we should reconsider biting the bullet and going through the slower, trusted deprecation process? It is true that we are short on person-hours to do the work, but if we estrange part of our user base increasing volunteer time will become even harder.
I think this is actually *counterproductive* in the context of the Konrad Hinsen rule: if we change slowly with a deprecation path of adding `preserve_range=False` whose default value migrates to True over two versions, then *you end up breaking the Hinsen rule*, just over a slightly longer timescale.
And you don’t get your compatibility layer — earlier versions of the library don’t have `preserve_range` so you can’t easily support the later version and the earlier version simultaneously.
In contrast, if we go with the 1.0 approach and bundle a large number of breaking API changes, the odds of breaking the Hinsen rule are much smaller: almost all skimage 0.x scripts will fail with skimage 1.0.
Is that really true? That you'd expect nearly all scikit-image scripts to fail with errors on version 1.0? The wisdom of the Hinsen rule, I think, is the understanding of how much scientific software exists as scripts, or de-facto packages, but without little formal maintenance, such as specification of package dependencies. If you make all these break (if they are lucky) or give completely wrong results, it's hard to imagine you aren't going to cause significant damage to the rest-of-iceberg body of users who are not on the mailing list. I guess the next question is about skimage2 and fracturing of the community. Could you say more about what this means in practice? I mean, in what sense will the community be more fractured compared to the skimage2 alternative? At the moment this is hard to understand, because you are saying that you are expecting almost no packages to be simultaneously maintaining code for scikit-image 0.18 and 1.0 - and that you are accepting that a lot of current code is going to break or give silently wrong results. That seems like community fracture plus a lot of lost goodwill - compared to the situation where those who know what they are doing can just shift over to skimage2. I suppose you could reduce maintenance by having skimage import skimage2, as in: def some_func(data): fdata = scale_to_float(data) return skimage2.some_func(fdata) Then the skimage package becomes your v0 namespace. Cheers, Matthew
Hello every body, and thank you all for your feedback! I am sorry but I don't really understand all the worry about code breaking... Versions 0.19 and older will not disappear when v1.0 will be released! Python environments are well adopted and installing specific versions of packages is easy now (pip, conda...). Moreover, we can still point users to good tutorials on how to use them. Versioning in general is getting more and more popular in the scientific community as it helps results reproducibility (code and datasets versioning). In scikit-image, we adopted the semantic versioning as it is largely adopted in the engineering community. This convention manages API breaking and that's what we are doing by releasing v1.0 I don't think that maintaining two packages is a good deal: forking only for users not adopting modern python programing convention is not reasonable to me. Cheers, Riadh. Le 20/07/2021 à 10:58, Matthew Brett a écrit :
Hi,
On Tue, Jul 20, 2021 at 2:22 AM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
Hi Stéfan, Matthew,
Thanks Matthew for weighing in. Stéfan, no worries about having second thoughts — the purpose of the SKIP process is to account for these! But, my responses below — I’m still in favour of the SKIP.
There are two main concerns: breaking the Hinsen rule, and supporting users through the transition. Regarding the 1st:
Perhaps we should reconsider biting the bullet and going through the slower, trusted deprecation process? It is true that we are short on person-hours to do the work, but if we estrange part of our user base increasing volunteer time will become even harder. I think this is actually *counterproductive* in the context of the Konrad Hinsen rule: if we change slowly with a deprecation path of adding `preserve_range=False` whose default value migrates to True over two versions, then *you end up breaking the Hinsen rule*, just over a slightly longer timescale.
And you don’t get your compatibility layer — earlier versions of the library don’t have `preserve_range` so you can’t easily support the later version and the earlier version simultaneously.
In contrast, if we go with the 1.0 approach and bundle a large number of breaking API changes, the odds of breaking the Hinsen rule are much smaller: almost all skimage 0.x scripts will fail with skimage 1.0. Is that really true? That you'd expect nearly all scikit-image scripts to fail with errors on version 1.0?
The wisdom of the Hinsen rule, I think, is the understanding of how much scientific software exists as scripts, or de-facto packages, but without little formal maintenance, such as specification of package dependencies.
If you make all these break (if they are lucky) or give completely wrong results, it's hard to imagine you aren't going to cause significant damage to the rest-of-iceberg body of users who are not on the mailing list.
I guess the next question is about skimage2 and fracturing of the community. Could you say more about what this means in practice? I mean, in what sense will the community be more fractured compared to the skimage2 alternative? At the moment this is hard to understand, because you are saying that you are expecting almost no packages to be simultaneously maintaining code for scikit-image 0.18 and 1.0 - and that you are accepting that a lot of current code is going to break or give silently wrong results. That seems like community fracture plus a lot of lost goodwill - compared to the situation where those who know what they are doing can just shift over to skimage2.
I suppose you could reduce maintenance by having skimage import skimage2, as in:
def some_func(data): fdata = scale_to_float(data) return skimage2.some_func(fdata)
Then the skimage package becomes your v0 namespace.
Cheers,
Matthew _______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: rfezzani@gmail.com
Hi, On Tue, Jul 20, 2021 at 11:03 AM Riadh <rfezzani@gmail.com> wrote:
Hello every body,
and thank you all for your feedback!
I am sorry but I don't really understand all the worry about code breaking... Versions 0.19 and older will not disappear when v1.0 will be released!
I bet that all of our understanding will rapidly increase when it happens!
Python environments are well adopted and installing specific versions of packages is easy now (pip, conda...). Moreover, we can still point users to good tutorials on how to use them.
Versioning in general is getting more and more popular in the scientific community as it helps results reproducibility (code and datasets versioning). In scikit-image, we adopted the semantic versioning as it is largely adopted in the engineering community. This convention manages API breaking and that's what we are doing by releasing v1.0
I think the underlying problem here is that developers generally have a very different set of instincts to users. Developers typically update their machines often, don't worry too much about code breaking, and may (although, surely this is uncommon?) pay some attention to changes in major version numbers. I think users generally do few of those things. I'm a developer, and I certainly don't expect huge API changes with changes in major version numbers, even though I know about semantic versioning. In practice, I think it is very uncommon to make large API changes between major versions, and especially, Hinsen-breakage API changes. So I think it's fairly typical for developers to think 'What's the problem?', because this kind of problem isn't a problem for them. But I think you'll find it is a major problem for almost everyone who is not a scikit-image developer, and does use scikit-image, and especially those who are not reading this email thread.
I don't think that maintaining two packages is a good deal: forking only for users not adopting modern python programing convention is not reasonable to me.
Just to return to the paragraph above, I bet that "users not adopting modern python programming convention" is a reasonable description of your average user. But - to return to the practical point - what is the extra work, and fracturing, that you see for the skimage / skimage2 option? If you don't want to maintain 0.18 / skimage - then no problem - just abandon skimage and work on skimage2. If you do want to backport fixes, then you might (or might not) find it easier to use skimage as a namespace, and use the current skimage2 as the home of the real code. Now the question again - what is the real advantage of calling the new version skimage 1.0 instead of skimage2? I mean, in what sense is the community less fractured? At the moment I am stuck with the implausible answer that it's a good idea to break lots of code to force developers onto the new skimage 1.0 version from the old one. But you can't mean that. So - can you say more about what you do mean? Cheers, Matthew
Hi, Le 20/07/2021 à 12:21, Matthew Brett a écrit :
I think the underlying problem here is that developers generally have a very different set of instincts to users. Developers typically update their machines often, don't worry too much about code breaking, and may (although, surely this is uncommon?) pay some attention to changes in major version numbers. I think users generally do few of those things. I'm a developer, and I certainly don't expect huge API changes with changes in major version numbers, even though I know about semantic versioning. In practice, I think it is very uncommon to make large API changes between major versions, and especially, Hinsen-breakage API changes.
I agree with you, developers may even don't worry about code breaking, as they may consider it as challenges to solve :D
So I think it's fairly typical for developers to think 'What's the problem?', because this kind of problem isn't a problem for them. But I think you'll find it is a major problem for almost everyone who is not a scikit-image developer, and does use scikit-image, and especially those who are not reading this email thread.
This is not my reasoning: my point is that it is in fact not a problem since a simple solution exists and is identified: communicate and offer a clear documentation to the community.
Just to return to the paragraph above, I bet that "users not adopting modern python programming convention" is a reasonable description of your average user.
You are right, and I bet that a large majority doesn't maintain a script or a package that will break with the v1.0 release. Users usually use the API as it is when they are coding. And more "serious" coders usually care about code breaking, and if they do, they pin they dependencies versions. If they don't, I am pretty sure that they will be happy to learn how to do it ;)
But - to return to the practical point - what is the extra work, and fracturing, that you see for the skimage / skimage2 option?
If you don't want to maintain 0.18 / skimage - then no problem - just abandon skimage and work on skimage2.
If you do want to backport fixes, then you might (or might not) find it easier to use skimage as a namespace, and use the current skimage2 as the home of the real code.
Maintaining a second package also means managing the infrastructure (Git, doc, CI, deployment...)
Now the question again - what is the real advantage of calling the new version skimage 1.0 instead of skimage2? I mean, in what sense is the community less fractured? At the moment I am stuck with the implausible answer that it's a good idea to break lots of code to force developers onto the new skimage 1.0 version from the old one. But you can't mean that. So - can you say more about what you do mean?
But what is the advantage of skimage2 vs skimage v1.0 if code break is a fake problem? I still convinced that we are not breaking code: ``` conda create -n old_skimage_env skimage=0.18 conda activate old_skimage python my_old_scipt_broke_by_v1.py ``` And finally, I am not sure, as a nostalgic skimage user, to be happy to call `import skimage2` :) Cheers, Riadh.
Hi. On Tue, Jul 20, 2021 at 12:35 PM Riadh <rfezzani@gmail.com> wrote:
Hi,
Le 20/07/2021 à 12:21, Matthew Brett a écrit :
I think the underlying problem here is that developers generally have a very different set of instincts to users. Developers typically update their machines often, don't worry too much about code breaking, and may (although, surely this is uncommon?) pay some attention to changes in major version numbers. I think users generally do few of those things. I'm a developer, and I certainly don't expect huge API changes with changes in major version numbers, even though I know about semantic versioning. In practice, I think it is very uncommon to make large API changes between major versions, and especially, Hinsen-breakage API changes.
I agree with you, developers may even don't worry about code breaking, as they may consider it as challenges to solve :D
So I think it's fairly typical for developers to think 'What's the problem?', because this kind of problem isn't a problem for them. But I think you'll find it is a major problem for almost everyone who is not a scikit-image developer, and does use scikit-image, and especially those who are not reading this email thread.
This is not my reasoning: my point is that it is in fact not a problem since a simple solution exists and is identified: communicate and offer a clear documentation to the community.
Just to return to the paragraph above, I bet that "users not adopting modern python programming convention" is a reasonable description of your average user.
You are right, and I bet that a large majority doesn't maintain a script or a package that will break with the v1.0 release.
Juan previously said:
In contrast, if we go with the 1.0 approach and bundle a large number of breaking API changes, the odds of breaking the Hinsen rule are much smaller: almost all skimage 0.x scripts will fail with skimage 1.0.
Is there some controversy about how much code would break? Juan suggests that nearly all scripts will break, and indeed, that's a feature because then they won't get Hinsen breakage. I'm not sure I agree with Juan's point there - I bet you'll find people fixing the errors, but not noticing the Hinsen breakage, but that's a side point.
Users usually use the API as it is when they are coding. And more "serious" coders usually care about code breaking, and if they do, they pin they dependencies versions. If they don't, I am pretty sure that they will be happy to learn how to do it ;)
But - to return to the practical point - what is the extra work, and fracturing, that you see for the skimage / skimage2 option?
If you don't want to maintain 0.18 / skimage - then no problem - just abandon skimage and work on skimage2.
If you do want to backport fixes, then you might (or might not) find it easier to use skimage as a namespace, and use the current skimage2 as the home of the real code.
Maintaining a second package also means managing the infrastructure (Git, doc, CI, deployment...)
Now the question again - what is the real advantage of calling the new version skimage 1.0 instead of skimage2? I mean, in what sense is the community less fractured? At the moment I am stuck with the implausible answer that it's a good idea to break lots of code to force developers onto the new skimage 1.0 version from the old one. But you can't mean that. So - can you say more about what you do mean?
But what is the advantage of skimage2 vs skimage v1.0 if code break is a fake problem? I still convinced that we are not breaking code:
```
conda create -n old_skimage_env skimage=0.18 conda activate old_skimage
python my_old_scipt_broke_by_v1.py
```
And finally, I am not sure, as a nostalgic skimage user, to be happy to call `import skimage2` :)
I can see that argument, but is that argument enough to justify breaking a large proportion of current user code? Cheers, Matthew
On Tue, Jul 20, 2021 at 7:35 AM Riadh <rfezzani@gmail.com> wrote:
Hi,
Le 20/07/2021 à 12:21, Matthew Brett a écrit :
I think the underlying problem here is that developers generally have a very different set of instincts to users. Developers typically update their machines often, don't worry too much about code breaking, and may (although, surely this is uncommon?) pay some attention to changes in major version numbers. I think users generally do few of those things. I'm a developer, and I certainly don't expect huge API changes with changes in major version numbers, even though I know about semantic versioning. In practice, I think it is very uncommon to make large API changes between major versions, and especially, Hinsen-breakage API changes.
I agree with you, developers may even don't worry about code breaking, as they may consider it as challenges to solve :D
So I think it's fairly typical for developers to think 'What's the problem?', because this kind of problem isn't a problem for them. But I think you'll find it is a major problem for almost everyone who is not a scikit-image developer, and does use scikit-image, and especially those who are not reading this email thread.
This is not my reasoning: my point is that it is in fact not a problem since a simple solution exists and is identified: communicate and offer a clear documentation to the community.
Just to return to the paragraph above, I bet that "users not adopting modern python programming convention" is a reasonable description of your average user.
You are right, and I bet that a large majority doesn't maintain a script or a package that will break with the v1.0 release.
Users usually use the API as it is when they are coding. And more "serious" coders usually care about code breaking, and if they do, they pin they dependencies versions. If they don't, I am pretty sure that they will be happy to learn how to do it ;)
But - to return to the practical point - what is the extra work, and fracturing, that you see for the skimage / skimage2 option?
If you don't want to maintain 0.18 / skimage - then no problem - just abandon skimage and work on skimage2.
If you do want to backport fixes, then you might (or might not) find it easier to use skimage as a namespace, and use the current skimage2 as the home of the real code.
Maintaining a second package also means managing the infrastructure (Git, doc, CI, deployment...)
I don't think forking the repository and maintaining separate packages is a viable option, but we discussed the possibility of following a similar path to what OpenCV did when they released 2.0. At that point they changed the top-level import to cv2 and kept the old 1.x API available via cv2.cv. With OpenCV 3.0 they dropped cv2.cv altogether and are still using the cv2 import name today. If we did a similar thing with "from skimage2 import ..." (new API) vs "from skimage2.skimage import ..." (old API), it is still a bit more work for maintainers to test and maintain both namespaces, but we would still only have one "skimage" on pypi and could keep our existing infrastructure (I think this is also what Matthew was suggesting with possibly having the old API be implemented as thin wrappers around the new one)? This prevents the Hinsen-style breakage because "import skimage" would immediately fail in 1.0, but old codes and existing libraries could be easily adapted to use skimage2.skimage until they have migrated to skimage2. It is also fine to remove "skimage2.skimage" in the next major release without causing silent breakage. Aesthetically, "import skimage2" is a little worse than "import skimage", but not something I am too concerned about.
Now the question again - what is the real advantage of calling the new version skimage 1.0 instead of skimage2? I mean, in what sense is the community less fractured? At the moment I am stuck with the implausible answer that it's a good idea to break lots of code to force developers onto the new skimage 1.0 version from the old one. But you can't mean that. So - can you say more about what you do mean?
But what is the advantage of skimage2 vs skimage v1.0 if code break is a fake problem? I still convinced that we are not breaking code:
```
conda create -n old_skimage_env skimage=0.18 conda activate old_skimage
python my_old_scipt_broke_by_v1.py
```
And finally, I am not sure, as a nostalgic skimage user, to be happy to call `import skimage2` :)
Cheers,
Riadh.
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: grlee77@gmail.com
Hi, I can cite the example of Tensorflow that didn't hesitated to break the API when moving from v1.5 to v2 Le 20/07/2021 à 14:32, Gregory Lee a écrit :
On Tue, Jul 20, 2021 at 7:35 AM Riadh <rfezzani@gmail.com <mailto:rfezzani@gmail.com>> wrote:
Hi,
Le 20/07/2021 à 12:21, Matthew Brett a écrit : > I think the underlying problem here is that developers generally have > a very different set of instincts to users. Developers typically > update their machines often, don't worry too much about code breaking, > and may (although, surely this is uncommon?) pay some attention to > changes in major version numbers. I think users generally do few of > those things. I'm a developer, and I certainly don't expect huge API > changes with changes in major version numbers, even though I know > about semantic versioning. In practice, I think it is very uncommon > to make large API changes between major versions, and especially, > Hinsen-breakage API changes.
I agree with you, developers may even don't worry about code breaking, as they may consider it as challenges to solve :D
> So I think it's fairly typical for developers to think 'What's the > problem?', because this kind of problem isn't a problem for them. But > I think you'll find it is a major problem for almost everyone who is > not a scikit-image developer, and does use scikit-image, and > especially those who are not reading this email thread.
This is not my reasoning: my point is that it is in fact not a problem since a simple solution exists and is identified: communicate and offer a clear documentation to the community.
> Just to return to the paragraph above, I bet that "users not adopting > modern python programming convention" is a reasonable description of > your average user.
You are right, and I bet that a large majority doesn't maintain a script or a package that will break with the v1.0 release.
Users usually use the API as it is when they are coding. And more "serious" coders usually care about code breaking, and if they do, they pin they dependencies versions. If they don't, I am pretty sure that they will be happy to learn how to do it ;)
> But - to return to the practical point - what is the extra work, and > fracturing, that you see for the skimage / skimage2 option? > > If you don't want to maintain 0.18 / skimage - then no problem - just > abandon skimage and work on skimage2. > > If you do want to backport fixes, then you might (or might not) find > it easier to use skimage as a namespace, and use the current skimage2 > as the home of the real code.
Maintaining a second package also means managing the infrastructure (Git, doc, CI, deployment...)
I don't think forking the repository and maintaining separate packages is a viable option, but we discussed the possibility of following a similar path to what OpenCV did when they released 2.0. At that point they changed the top-level import to cv2 and kept the old 1.x API available via cv2.cv <http://cv2.cv>. With OpenCV 3.0 they dropped cv2.cv <http://cv2.cv> altogether and are still using the cv2 import name today. If we did a similar thing with "from skimage2 import ..." (new API) vs "from skimage2.skimage import ..." (old API), it is still a bit more work for maintainers to test and maintain both namespaces, but we would still only have one "skimage" on pypi and could keep our existing infrastructure (I think this is also what Matthew was suggesting with possibly having the old API be implemented as thin wrappers around the new one)? This prevents the Hinsen-style breakage because "import skimage" would immediately fail in 1.0, but old codes and existing libraries could be easily adapted to use skimage2.skimage until they have migrated to skimage2. It is also fine to remove "skimage2.skimage" in the next major release without causing silent breakage. Aesthetically, "import skimage2" is a little worse than "import skimage", but not something I am too concerned about.
> Now the question again - what is the real advantage of calling the new > version skimage 1.0 instead of skimage2? I mean, in what sense is > the community less fractured? At the moment I am stuck with the > implausible answer that it's a good idea to break lots of code to > force developers onto the new skimage 1.0 version from the old one. > But you can't mean that. So - can you say more about what you do > mean?
But what is the advantage of skimage2 vs skimage v1.0 if code break is a fake problem? I still convinced that we are not breaking code:
```
conda create -n old_skimage_env skimage=0.18 conda activate old_skimage
python my_old_scipt_broke_by_v1.py
```
And finally, I am not sure, as a nostalgic skimage user, to be happy to call `import skimage2` :)
Cheers,
Riadh.
_______________________________________________ scikit-image mailing list -- scikit-image@python.org <mailto:scikit-image@python.org> To unsubscribe send an email to scikit-image-leave@python.org <mailto:scikit-image-leave@python.org> https://mail.python.org/mailman3/lists/scikit-image.python.org/ <https://mail.python.org/mailman3/lists/scikit-image.python.org/> Member address: grlee77@gmail.com <mailto:grlee77@gmail.com>
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: rfezzani@gmail.com
On Tue, Jul 20, 2021 at 1:33 PM Gregory Lee <grlee77@gmail.com> wrote:
On Tue, Jul 20, 2021 at 7:35 AM Riadh <rfezzani@gmail.com> wrote:
Hi,
Le 20/07/2021 à 12:21, Matthew Brett a écrit :
I think the underlying problem here is that developers generally have a very different set of instincts to users. Developers typically update their machines often, don't worry too much about code breaking, and may (although, surely this is uncommon?) pay some attention to changes in major version numbers. I think users generally do few of those things. I'm a developer, and I certainly don't expect huge API changes with changes in major version numbers, even though I know about semantic versioning. In practice, I think it is very uncommon to make large API changes between major versions, and especially, Hinsen-breakage API changes.
I agree with you, developers may even don't worry about code breaking, as they may consider it as challenges to solve :D
So I think it's fairly typical for developers to think 'What's the problem?', because this kind of problem isn't a problem for them. But I think you'll find it is a major problem for almost everyone who is not a scikit-image developer, and does use scikit-image, and especially those who are not reading this email thread.
This is not my reasoning: my point is that it is in fact not a problem since a simple solution exists and is identified: communicate and offer a clear documentation to the community.
Just to return to the paragraph above, I bet that "users not adopting modern python programming convention" is a reasonable description of your average user.
You are right, and I bet that a large majority doesn't maintain a script or a package that will break with the v1.0 release.
Users usually use the API as it is when they are coding. And more "serious" coders usually care about code breaking, and if they do, they pin they dependencies versions. If they don't, I am pretty sure that they will be happy to learn how to do it ;)
But - to return to the practical point - what is the extra work, and fracturing, that you see for the skimage / skimage2 option?
If you don't want to maintain 0.18 / skimage - then no problem - just abandon skimage and work on skimage2.
If you do want to backport fixes, then you might (or might not) find it easier to use skimage as a namespace, and use the current skimage2 as the home of the real code.
Maintaining a second package also means managing the infrastructure (Git, doc, CI, deployment...)
I don't think forking the repository and maintaining separate packages is a viable option, but we discussed the possibility of following a similar path to what OpenCV did when they released 2.0. At that point they changed the top-level import to cv2 and kept the old 1.x API available via cv2.cv. With OpenCV 3.0 they dropped cv2.cv altogether and are still using the cv2 import name today. If we did a similar thing with "from skimage2 import ..." (new API) vs "from skimage2.skimage import ..." (old API), it is still a bit more work for maintainers to test and maintain both namespaces, but we would still only have one "skimage" on pypi and could keep our existing infrastructure (I think this is also what Matthew was suggesting with possibly having the old API be implemented as thin wrappers around the new one)? This prevents the Hinsen-style breakage because "import skimage" would immediately fail in 1.0, but old codes and existing libraries could be easily adapted to use skimage2.skimage until they have migrated to skimage2. It is also fine to remove "skimage2.skimage" in the next major release without causing silent breakage. Aesthetically, "import skimage2" is a little worse than "import skimage", but not something I am too concerned about.
That does seem like a very reasonable compromise, where the only real expense is the slightly kludgy name. And it means that those of us depending on skimage can just do: try: import skimage except ImportError: import skimage2.skimage as skimage or similar, to remain compatible for the period until 2.0 - which I imagine will be long enough that most of us will be happy to drop the < 1.0 branch at that point. Cheers, Matthew
Le 20/07/2021 à 15:10, Matthew Brett a écrit :
On Tue, Jul 20, 2021 at 1:33 PM Gregory Lee <grlee77@gmail.com> wrote: That does seem like a very reasonable compromise, where the only real expense is the slightly kludgy name.
And it means that those of us depending on skimage can just do:
try: import skimage except ImportError: import skimage2.skimage as skimage
I am sorry, but this looks like a Hinsen rule break... But if such thing is acceptable, I am more incline to Tensorflow's strategy <https://www.tensorflow.org/guide/migrate?hl=en>: ``` import skimage.compat.v1 as skimage skimage.disable_v2_behavior() ```
Hi, On Tue, Jul 20, 2021 at 3:33 PM Riadh <rfezzani@gmail.com> wrote:
Le 20/07/2021 à 15:10, Matthew Brett a écrit :
On Tue, Jul 20, 2021 at 1:33 PM Gregory Lee <grlee77@gmail.com> wrote:
That does seem like a very reasonable compromise, where the only real expense is the slightly kludgy name.
And it means that those of us depending on skimage can just do:
try: import skimage except ImportError: import skimage2.skimage as skimage
I am sorry, but this looks like a Hinsen rule break... But if such thing is acceptable, I am more incline to Tensorflow's strategy:
Could you explain more? From what I understood, `skimage` in these two imports would always be the < 1.0 API - and so no breakage at all. This is in the situation Gregory described where the top-level import in 1.0 would be `import skimage2`.
import skimage.compat.v1 as skimage
skimage.disable_v2_behavior()
Could you explain what you mean here? By "v1" do you mean what I have been calling the version < 1.0? What is the second line designed to do; haven't I already specified what behavior I want with the first line? Cheers, Matthew
On Tue, Jul 20, 2021, at 05:32, Gregory Lee wrote:
This prevents the Hinsen-style breakage because "import skimage" would immediately fail in 1.0, but old codes and existing libraries could be easily adapted to use skimage2.skimage until they have migrated to skimage2. It is also fine to remove "skimage2.skimage" in the next major release without causing silent breakage. Aesthetically, "import skimage2" is a little worse than "import skimage", but not something I am too concerned about.
To be fair to Juan, I think this was one of his initial suggestions, but some of us balked at the thought of renaming the library or maintaining two versions. Hence the suggested technical footwork. But, reading the arguments here, I am convinced that the only way to avoid Hinsen-type changes AND give programmatic errors for all future versions is to change the import name. Then, there is the question of whether to support the existing API inside of `skimage2`. My gut feel is to make different packages (`pip install scikit-image` becomes `pip install skimage2`), and to let people hang on to `import skimage` until they are ready to `import skimage2`. We can also backport bugfixes for a while. This is getting into the weeds, but if we go this route we should probably match the version numbers --- `skimage 2.0` imports as `skimage2` and simply skip 1.0. Stéfan
Hey everyone, this is getting big :) Sorry I won't answer to all the great points both Matts, Stéfan, Riadh, Greg and Juan put here, but: [TL;DR] 1. Sorry Juan, I take it back! :) 2. How about skimage.v0?
To be fair to Juan, I think this was one of his initial suggestions, but some of us balked at the thought of renaming the library or maintaining two versions. Hence the suggested technical footwork.
I was one of the first naysayers, and I'd like to take it back. I'd prefer us having the library going forward in the direction we planned than having to wait some ages to propose the API and general changes we would like to do. On the names: I humbly think that "skimage" should always point to the latest version. I like the idea of using "from skimage.v0 import ...", for instance. The breakage would be minimal, we still would have v0 in a clean way, and we have the space to go wild in 1.0. If we have the same discussion in 10 years for skimage 2, we can have "from skimage.v1 import ..." and we go on with our lives. I think that is the closest we'd get to please Greeks and Trojans (agradar gregos e troianos, in good Portuguese) :) My two BRL cents, though. Kind regards, Alex On Tue, 2021-07-20 at 11:11 -0700, Stefan van der Walt wrote:
On Tue, Jul 20, 2021, at 05:32, Gregory Lee wrote:
This prevents the Hinsen-style breakage because "import skimage" would immediately fail in 1.0, but old codes and existing libraries could be easily adapted to use skimage2.skimage until they have migrated to skimage2. It is also fine to remove "skimage2.skimage" in the next major release without causing silent breakage. Aesthetically, "import skimage2" is a little worse than "import skimage", but not something I am too concerned about.
To be fair to Juan, I think this was one of his initial suggestions, but some of us balked at the thought of renaming the library or maintaining two versions. Hence the suggested technical footwork.
But, reading the arguments here, I am convinced that the only way to avoid Hinsen-type changes AND give programmatic errors for all future versions is to change the import name.
Then, there is the question of whether to support the existing API inside of `skimage2`. My gut feel is to make different packages (`pip install scikit-image` becomes `pip install skimage2`), and to let people hang on to `import skimage` until they are ready to `import skimage2`. We can also backport bugfixes for a while.
This is getting into the weeds, but if we go this route we should probably match the version numbers --- `skimage 2.0` imports as `skimage2` and simply skip 1.0.
Stéfan
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: alex.desiqueira@igdore.org
-- -------------------------------------------------- Dr. Alexandre de Siqueira Berkeley Institute for Data Science - BIDS 190 Doe Library University of California, Berkeley Berkeley, CA 94720 United States Lattes CV: 3936721630855880 ORCID: 0000-0003-1320-4347 Github: alexdesiqueira Twitter: alexdesiqueira --------------------------------------------------
On Tue, Jul 20, 2021, at 11:37, Alexandre de Siqueira wrote:
On the names: I humbly think that "skimage" should always point to the latest version.
Unfortunately, this is not possible, because it would mean that old code will start generating different results. This is the "Hinsen-rule" under discussion. `foo(x)` should always return the same thing, or the code should break. Stéfan
Hey Stéfan,
Unfortunately, this is not possible, because it would mean that old code will start generating different results. This is the "Hinsen-rule" under discussion. `foo(x)` should always return the same thing, or the code should break.
sorry if that was implicit, but this won't be against Hinsen. My point is that old code would break when using skimage v1, since we _will_ modify the API a lot. I don't think that we intend breaking anything silently :) Alex On Tue, 2021-07-20 at 11:44 -0700, Stefan van der Walt wrote:
On Tue, Jul 20, 2021, at 11:37, Alexandre de Siqueira wrote:
On the names: I humbly think that "skimage" should always point to the latest version.
Unfortunately, this is not possible, because it would mean that old code will start generating different results. This is the "Hinsen- rule" under discussion. `foo(x)` should always return the same thing, or the code should break.
Stéfan _______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: alex.desiqueira@igdore.org
-- -------------------------------------------------- Dr. Alexandre de Siqueira Berkeley Institute for Data Science - BIDS 190 Doe Library University of California, Berkeley Berkeley, CA 94720 United States Lattes CV: 3936721630855880 ORCID: 0000-0003-1320-4347 Github: alexdesiqueira Twitter: alexdesiqueira --------------------------------------------------
On Tue, Jul 20, 2021, at 11:57, Alexandre de Siqueira wrote:
sorry if that was implicit, but this won't be against Hinsen. My point is that old code would break when using skimage v1, since we _will_ modify the API a lot. I don't think that we intend breaking anything silently :)
W.r.t. the `preserve_range` flag, we will remove the flag and no longer do implicit conversion. So, passing the same code through the two versions will give different results. Current skimage: from skimage import foo foo(uint8_arr) -> [0, 1.0, 0.502] New skimage: foo(uint8_arr) -> [0, 255.0, 128.0] Stéfan
I found this presentation from Hinsen <https://calcul.math.cnrs.fr/attachments/spip/IMG/pdf/cours_reproductibilite.pdf> , correct me if I am wrong, but I found nothing in contradiction with the proposed SKIP, please see 3rd bullet in slide 14. My understanding of Hinsen rule is less forbiding API breaks then saving dependencies version numbers... Cheers, Riadh. Le mar. 20 juil. 2021 à 12:04, Stefan van der Walt <stefanv@berkeley.edu> a écrit :
On Tue, Jul 20, 2021, at 11:57, Alexandre de Siqueira wrote:
sorry if that was implicit, but this won't be against Hinsen. My point is that old code would break when using skimage v1, since we _will_ modify the API a lot. I don't think that we intend breaking anything silently :)
W.r.t. the `preserve_range` flag, we will remove the flag and no longer do implicit conversion. So, passing the same code through the two versions will give different results.
Current skimage:
from skimage import foo foo(uint8_arr) -> [0, 1.0, 0.502]
New skimage:
foo(uint8_arr) -> [0, 255.0, 128.0]
Stéfan _______________________________________________ scikit-image mailing list -- scikit-image@python.org <mailto:scikit-image@python.org> To unsubscribe send an email to scikit-image-leave@python.org <mailto:scikit-image-leave@python.org> <https://mail.python.org/mailman3/lists/scikit-image.python.org/> Member address: rfezzani@gmail.com <mailto:rfezzani@gmail.com>
On Tue, Jul 20, 2021, at 14:41, Riadh wrote:
I found this presentation from Hinsen https://calcul.math.cnrs.fr/attachments/spip/IMG/pdf/cours_reproductibilite.... , correct me if I am wrong, but I found nothing in contradiction with the proposed SKIP, please see 3rd bullet in slide 14. My understanding of Hinsen rule is less forbiding API breaks then saving dependencies version numbers...
Let's make the distinction clear: An API break is when you do something like this: v0: foo(x, rescale=True) -> out v1: foo(x, rescale=True) -> error, rescale is not a valid keyword argument The "Hinsen rule" (which is just a handle on the following concept; we could just as well call it "silent behavioral changes") is: v0: foo(x) == y v1: foo(x) != y This last behavior change is a big problem, and one that should be avoided at all costs. The prior is not so serious, because users get feedback when their code fails. Once they fix it up, it works again and works correctly. If you do not warn your users when behavior changes, then their code can deliver the wrong results without them knowing---and this is what we need to avoid. Stéfan P.S. With our existing deprecation mechanism, it is possible to hit a point in the future where we accidentally have the same API calls return different results. We should be cognizant of that failure mode, and avoid it. One way is to never modify the "expected" output part of tests without careful consideration.
Ok, seems like a good time to summarise the thread again: Everyone agrees breaking the Hinsen rule (“avoid silent behavioural changes”) is bad. Over 4 versions, scikit-image has actually broken it a few times already. We certainly don’t want to break it en-masse with 1.0. Riadh thinks, correct me if I’m wrong Riadh, that breaking it for 1.0 is ok, *especially* given the 0.20 warning. To be honest, one thing I like about the 0.20 warning is that it will *teach* people to pay attention to version numbers. The other plans, not so much. And these are important not just in this skimage transition but throughout the ecosystem. The Hinsen rule is broken dozens of times across the ecosystem. Even NumPy allows this over “long” deprecation periods. (See the copy=’never’ discussion.) But, back to summaries. A different issue is API breakage. This is not as bad as the Hinsen rule but it can also be bad for user goodwill. An idea that seems to be gaining momentum is to change the import name, but it’s unclear whether people favour keeping the old import name around or moving it to v0, and it’s also unclear whether people favour moving the PyPI *package* name, with scikit-image frozen forever in 0.19. So here’s some named options: ———————— - the SKIP: old API import: unavailable at 1.0 new API import: skimage old API package: scikit-image <1.* new API package: scikit-image 1.* Pros: * uses semver correctly * with enough warning, lets users pin their dependencies intentionally, improving the reproducibility of their packages Cons: * users who don’t run their code in the transition period won’t be warned * if we don’t break the API enough, risks breaking Hinsen rule * if we break it completely, risks annoying users - the frozen package: old API import: skimage new API import: skimage.v1 old API package = new API package = scikit-image 1.* Pros: * lets anyone migrate to new API at their own pace * existing code and (more important, imho) existing StackOverflow answers etc continue to work Cons: * no pressure for anyone to move to new API * could take years to get people to migrate, splintering the community * ability to mix code between APIs could give *very* confusing results - the versioned package: old API import: skimage.v0 new API import: skimage.v1 (skimage by itself errors) old API package = new API package = scikit-image 1.* Pros: * Forces people to be intentional about their API choice *or* simply pin * no risk of breaking Hinsen rule Cons: * minimal pressure to move to v1 * could take years for people to migrate, splintering the community * ability to mix code between APIs could give *very* confusing results - the new name, new import package: old API import: skimage new API import: skimage2 old API package: scikit-image (any version) new API package: skimage2 (any version) Pros: * Clear distinction between APIs both on the dependency level and import level * Clear when reading someone’s code what version they are using * no risk of breaking the Hinsen rule * FINALLY, our package name matches our import name 🎉 Cons: * marginally more annoying import * confusing for package managers — see e.g. ‘conda install pyqt’ vs ‘pip install pyqt5’ * potentially slow migration as users might not quickly become aware of skimage2 * what do we do for subsequent versions? e.g. opencv-python is at version 4.5 but imports as cv2 🤦♂️ - new name, same import: old API import: skimage new API import: skimage old API package: scikit-image (any version) new API package: skimage[2] (any version) (noting here that we have skimage available and unused right now, though this might just be confusing given scikit-learn.) Pros: * Get to keep skimage import * FINALLY, our package name matches our import name 🎉 Cons: * No pressure for people to migrate * No pressure for people to pin their package dependencies * Unclear when reading code whether they are using skimage<1. Together with previous two, this to me is a deal breaker. ———————— Any option I haven't covered? What do people prefer? After writing all of them out, my preferences oscillate between the SKIP and new name, new import. I’ll make one more note about the SKIP: one option is to not release 1.0 for another full year, even two: ie we keep the warning versions for longer, together with 1.XrcY. This should give ample warning and time for people to either pin or migrate. Juan.
On 21 Jul 2021, at 7:56 am, Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Tue, Jul 20, 2021, at 14:41, Riadh wrote:
I found this presentation from Hinsen https://calcul.math.cnrs.fr/attachments/spip/IMG/pdf/cours_reproductibilite.... <https://calcul.math.cnrs.fr/attachments/spip/IMG/pdf/cours_reproductibilite.pdf> , correct me if I am wrong, but I found nothing in contradiction with the proposed SKIP, please see 3rd bullet in slide 14. My understanding of Hinsen rule is less forbiding API breaks then saving dependencies version numbers...
Let's make the distinction clear:
An API break is when you do something like this:
v0: foo(x, rescale=True) -> out v1: foo(x, rescale=True) -> error, rescale is not a valid keyword argument
The "Hinsen rule" (which is just a handle on the following concept; we could just as well call it "silent behavioral changes") is:
v0: foo(x) == y v1: foo(x) != y
This last behavior change is a big problem, and one that should be avoided at all costs. The prior is not so serious, because users get feedback when their code fails. Once they fix it up, it works again and works correctly.
If you do not warn your users when behavior changes, then their code can deliver the wrong results without them knowing---and this is what we need to avoid.
Stéfan
P.S. With our existing deprecation mechanism, it is possible to hit a point in the future where we accidentally have the same API calls return different results. We should be cognizant of that failure mode, and avoid it. One way is to never modify the "expected" output part of tests without careful consideration. _______________________________________________ scikit-image mailing list -- scikit-image@python.org <mailto:scikit-image@python.org> To unsubscribe send an email to scikit-image-leave@python.org <mailto:scikit-image-leave@python.org> https://mail.python.org/mailman3/lists/scikit-image.python.org/ <https://mail.python.org/mailman3/lists/scikit-image.python.org/> Member address: jni@fastmail.com <mailto:jni@fastmail.com>
Hi Juan, On Tue, Jul 20, 2021, at 18:27, Juan Nunez-Iglesias wrote:
Everyone agrees breaking the Hinsen rule (“avoid silent behavioural changes”) is bad. Over 4 versions, scikit-image has actually broken it a few times already. We certainly don’t want to break it en-masse with 1.0.
Yes, we shouldn't do that (or have done that) 😬
Riadh thinks, correct me if I’m wrong Riadh, that breaking it for 1.0 is ok, *especially* given the 0.20 warning. To be honest, one thing I like about the 0.20 warning is that it will *teach* people to pay attention to version numbers. The other plans, not so much. And these are important not just in this skimage transition but throughout the ecosystem. The Hinsen rule is broken dozens of times across the ecosystem. Even NumPy allows this over “long” deprecation periods. (See the copy=’never’ discussion.) But, back to summaries.
Maybe you can teach people just as much this way. You can, e.g., have `import skimage` with 2.0 installed raise an error that explains exactly what's going on ("you should install skimage2 and switch to the new skimage2 API"). NumPy is not breaking the Hinsen-rule with the `copy='never'` decision (hence the long discussion). Currently, `True` means always copy, and `False` means `copy if necessary`. If it gets changed, then False will be *more strict* (i.e., no invalid results) and if the enum is used then there will be no silent regression. A concern raised in that discussion is that *newer* code might break on *older* versions of NumPy, which is an important consideration but a much less common scenario than the other way around.
A different issue is API breakage. This is not as bad as the Hinsen rule but it can also be bad for user goodwill.
I agree that we should try and minimize breakage as far as possible, but also use the new opportunity to ensure that everything is consistent---this will benefit us and the users greatly in the long run.
*- the SKIP:*
old API import: unavailable at 1.0 new API import: skimage old API package: scikit-image <1.* new API package: scikit-image 1.*
Pros: * uses semver correctly * with enough warning, lets users pin their dependencies intentionally, improving the reproducibility of their packages Cons: * users who don’t run their code in the transition period won’t be warned * if we don’t break the API enough, risks breaking Hinsen rule * if we break it completely, risks annoying users
It feels here as though we are saying: we need to break the API enough so users know what's going on, but not so much that we drive them insane. That is a fine balance, and it may be easier to be explicit. If the principle is that we want to ensure that user code from a while ago will run correctly or break, then it makes it easier to rule out some options. After all, two of our values are: (a) consistent API and (b) ensuring correctness.
*- the frozen package:*
old API import: skimage new API import: skimage.v1 old API package = new API package = scikit-image 1.*
Pros: * lets anyone migrate to new API at their own pace * existing code and (more important, imho) existing StackOverflow answers etc continue to work Cons: * no pressure for anyone to move to new API * could take years to get people to migrate, splintering the community * ability to mix code between APIs could give *very* confusing results
Maintaining two versions of the API in one package, while borrow to-and-fro, is a complex operation to get right. If you modify the newer version, you have to test very carefully to ensure that the older version doesn't change as well. If you only backport bugfixes to a separate package, that becomes a lot easier (if a bit labor intensive).
*- the new name, new import package:*
old API import: skimage new API import: skimage2 old API package: scikit-image (any version) new API package: skimage2 (any version)
This is my preference (although I don't care so much about the new API package name).
Pros: * Clear distinction between APIs both on the dependency level and import level * Clear when reading someone’s code what version they are using * no risk of breaking the Hinsen rule * FINALLY, our package name matches our import name 🎉 Cons: * marginally more annoying import * confusing for package managers — see e.g. ‘conda install pyqt’ vs ‘pip install pyqt5’
If it is too annoying we can always keep `scikit-image` as the install name. Although I think it's easier to communicate with packagers than it is with users (there are fewer of them, at least :).
* potentially slow migration as users might not quickly become aware of skimage2
I wonder if a warning on import of scikit-image 1.x, telling users about skimage2, is considered bad form?
* what do we do for subsequent versions? e.g. opencv-python is at version 4.5 but imports as cv2 🤦♂️
skimage v2.199.0 until we need another refactor ? :)
I’ll make one more note about the SKIP: one option is to not release 1.0 for another full year, even two: ie we keep the warning versions for longer, together with 1.XrcY. This should give ample warning and time for people to either pin or migrate.
We can always hope to communicate changes adequately. But the most certain way of doing so is to let the code do the talking. If the code doesn't work, and lets the user know why, no-one will mistakenly use the wrong package. No-one has to read release notes/mailing lists, reinstall in a certain time-frame, or accidentally upgrade to the tripwire version. I also fear what will happen if beginner users run into the pinning-or-migrate solution. The simpler the technical solution we can come up with, the less likely that it will trip up our users. Stéfan
Hi, Le 21/07/2021 à 03:27, Juan Nunez-Iglesias a écrit :
Riadh thinks, correct me if I’m wrong Riadh, that breaking it for 1.0 is ok, *especially* given the 0.20 warning. To be honest, one thing I like about the 0.20 warning is that it will *teach* people to pay attention to version numbers. The other plans, not so much. And these are important not just in this skimage transition but throughout the ecosystem. The Hinsen rule is broken dozens of times across the ecosystem. Even NumPy allows this over “long” deprecation periods. (See the copy=’never’ discussion.) But, back to summaries.
That's in fact my opinion :) @Stefan, the copy='never' discussion is may be a bad example, but Hinsen himself cites <https://hal.archives-ouvertes.fr/hal-02117588/document> Numpy as a reason for his code to collapse: "Today’scontributorsandmaintainersofthescientificPython infrastructure come from backgrounds with a much faster time scale of change. For them, NumPy is probably asufficiently stable infrastructure, whereas for me it isn’t..." Scikit-image depends on Numpy/Scipy that are already Hinsen-rule-breakers. Back to our discussion, changing the package name or the import name will impact all our users, while only few of them are concerned with the Hinsen rule. Let's guide those ones to use virtual envs and version pinning that will definitely help them in developing reproducible research. Riadh.
On Wed, Jul 21, 2021, at 00:17, Riadh wrote:
Scikit-image depends on Numpy/Scipy that are already Hinsen-rule-breakers.
I don't think NumPy and SciPy are Hinsen-rule breakers. There's a very strong sense in both communities that that should not to happen. Again, to be specific, here I speak of *silent changes of behavior*, which Matthew dubbed the Hinsen-rule. Hinsen himself has *many* requirements of software, and I'm sure NumPy and SciPy don't fulfill all of them.
Back to our discussion, changing the package name or the import name will impact all our users, while only few of them are concerned with the Hinsen rule. Let's guide those ones to use virtual envs and version pinning that will definitely help them in developing reproducible research.
The changes proposed will impact all our users either way; there is no option not to be concerned. It will bite you if you have *any* pre-existing skimage code. What is being argued here is that we make it explicit when we break our contract. We have no way of knowing who the users are that "need help" in learning how to pin versions etc. In fact, those are exactly the users you will not see on a mailing list or forum. The only way to help them is by letting the software speak for you. If we really want to help reproducibility along, then we should build a reliable ecosystem of software that treats its users, their work, and their time with respect. By making our changes explicit, we do our best to ensure that invalid results don't accidentally end up where they shouldn't (in publications, for example). Stéfan
I agree with all your concerns Stefan, my only point is that the changes are not silent: the major version number is upgraded and that's, I think, a sufficient indicator telling that things may break in old code if someone tries to run it with the new version. I am not against adding warnings or whatever we think is necessary to inform users about "silent changes of behavior", but I am convinced that package or import renaming is a bad move. Riadh. Le 21/07/2021 à 09:50, Stefan van der Walt a écrit :
On Wed, Jul 21, 2021, at 00:17, Riadh wrote:
Scikit-image depends on Numpy/Scipy that are already Hinsen-rule-breakers.
I don't think NumPy and SciPy are Hinsen-rule breakers. There's a very strong sense in both communities that that should not to happen. Again, to be specific, here I speak of *silent changes of behavior*, which Matthew dubbed the Hinsen-rule.
Hinsen himself has *many* requirements of software, and I'm sure NumPy and SciPy don't fulfill all of them.
Back to our discussion, changing the package name or the import name will impact all our users, while only few of them are concerned with the Hinsen rule. Let's guide those ones to use virtual envs and version pinning that will definitely help them in developing reproducible research.
The changes proposed will impact all our users either way; there is no option not to be concerned. It will bite you if you have *any* pre-existing skimage code. What is being argued here is that we make it explicit when we break our contract.
We have no way of knowing who the users are that "need help" in learning how to pin versions etc. In fact, those are exactly the users you will not see on a mailing list or forum. The only way to help them is by letting the software speak for you.
If we really want to help reproducibility along, then we should build a reliable ecosystem of software that treats its users, their work, and their time with respect. By making our changes explicit, we do our best to ensure that invalid results don't accidentally end up where they shouldn't (in publications, for example).
Stéfan
Hello, I would be a bit cautious a bit about the "changes are not silent because the major version number is upgraded." That opens the door to doing a lot more major versions in order to "allow" for API breakage when it could be avoided. As a user, I find that it would be nice if my code that only depends on numpy, scipy, and matplotlib that I started at the beginning of a research project with up-to-date packages also worked at submission time with up-to-date versions of those packages with minimal changes to the code :p Cheers, N On Wed, 21 Jul 2021 at 10:35, Riadh <rfezzani@gmail.com> wrote:
I agree with all your concerns Stefan, my only point is that the changes are not silent: the major version number is upgraded and that's, I think, a sufficient indicator telling that things may break in old code if someone tries to run it with the new version.
I am not against adding warnings or whatever we think is necessary to inform users about "silent changes of behavior", but I am convinced that package or import renaming is a bad move.
Riadh.
Le 21/07/2021 à 09:50, Stefan van der Walt a écrit :
On Wed, Jul 21, 2021, at 00:17, Riadh wrote:
Scikit-image depends on Numpy/Scipy that are already Hinsen-rule-breakers.
I don't think NumPy and SciPy are Hinsen-rule breakers. There's a very strong sense in both communities that that should not to happen. Again, to be specific, here I speak of *silent changes of behavior*, which Matthew dubbed the Hinsen-rule.
Hinsen himself has *many* requirements of software, and I'm sure NumPy and SciPy don't fulfill all of them.
Back to our discussion, changing the package name or the import name will impact all our users, while only few of them are concerned with the Hinsen rule. Let's guide those ones to use virtual envs and version pinning that will definitely help them in developing reproducible research.
The changes proposed will impact all our users either way; there is no option not to be concerned. It will bite you if you have *any* pre-existing skimage code. What is being argued here is that we make it explicit when we break our contract.
We have no way of knowing who the users are that "need help" in learning how to pin versions etc. In fact, those are exactly the users you will not see on a mailing list or forum. The only way to help them is by letting the software speak for you.
If we really want to help reproducibility along, then we should build a reliable ecosystem of software that treats its users, their work, and their time with respect. By making our changes explicit, we do our best to ensure that invalid results don't accidentally end up where they shouldn't (in publications, for example).
Stéfan
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: nelle.varoquaux@gmail.com
Le 21/07/2021 à 10:55, Nelle Varoquaux a écrit :
Hello,
I would be a bit cautious a bit about the "changes are not silent because the major version number is upgraded." That opens the door to doing a lot more major versions in order to "allow" for API breakage when it could be avoided.
As a user, I find that it would be nice if my code that only depends on numpy, scipy, and matplotlib that I started at the beginning of a research project with up-to-date packages also worked at submission time with up-to-date versions of those packages with minimal changes to the code :p
Cheers, N
That's not what I am calling for, we are talking here about a first major release after a years long development period. I am absolutely not pushing for a faster release pace ;) Riadh.
I think Nelle is concerned that doing it once (and surviving 😉) is enough to trigger a mindset change in the core developers that breaking releases are Ok, and thus can be done more frequently. For example, matplotlib 1.0 was released in July 2010, 2.0 in January 2017, then 3.0 in Sept 2018. 😬 In private discussions with Nelle she suggested that she *also* didn’t like skimage2, and in fact would advocate for gradual but breaking changes — meaning, at some point, `filters.gaussian(image)` becomes an *error* without `filters.gaussian(image, preserve_range=True)`. Then in later versions we deprecate that arg. This can all happen very slowly but the key is that the silent behaviour change is extremely noisy somewhere in the middle. You end up with a silent behaviour change but not before everything breaks loudly for an as-yet-unspecified period of time. The key is that people pay attention to errors, but tend to ignore warnings. Now, for some, that *final* state, that 0.19 code ends up working differently in 0.23, even though it was broken in 0.22, is enough to say, let’s go with new-name/new-import. That’s fine. But for others, one big issue with 1.0 is that it was impossible to warn loudly enough. So I’d like to propose what I call the chaotic good modification to the SKIP as an option: - 0.19 is released, no deprecations, no nothing, everything is as it always was. - over six months, we break everything we want to break on master on the road to the brave new 1.0 world. - once we have everything where we want it, we release 0.20, identical to 0.19 but with a warning to migrate. We simultaneously release 1.0b0, so that people can migrate and depend explicitly on scikit-image>=1.0b0. - after six months, we release 0.21. This is a completely broken release that does not import: it raises an exception that you *must* either migrate or pin to continue using scikit-image. - after an unspecified period of time, we finally release 1.0. Releasing a broken package is definitely chaotic, but it *does* force people to pay attention, and if the error makes the fix obvious enough, people are ok with this. To me, this really dramatically diminishes the risk that the transition will go unnoticed by most people. The major downside relative to the new package/new import solution is still that you *do* make 10y of stack overflow answers obsolete in some form or another. And, to be honest, many people don’t install from requirements files, they install with `pip install [package that I am reading about for the first time]`. Anyway, I think that in a perfect world, Riadh, yes, the version bump should be enough signal. But it’s not a perfect world and users have expectations. I don’t think it’s good enough from a “customer satisfaction” standpoint if we just say “sorry that you didn’t read the fine print on the label and are now missing an eyeball” — we should indeed try to put more measures in place than a small label to prevent people from poking their eyeballs out. 😜 As a real-world example, a bunch of people in Australia got burnt by their thermomixes when they sloshed too much hot liquid around, even though the instructions say don’t use turbo mode on hot liquids. Thermomix ended up shipping TM5s (new model with more safety features) to everyone who purchased a TM31 in the last x months of its life. (Including yours truly) Moral of the story: let’s try from the beginning not to burn our users even if they didn’t read the label. 😉 Although, the counterpoint: it is super annoying that the TM5 holds the lid closed for 8 seconds now before letting you open it. 😂 Juan.
On 21 Jul 2021, at 7:06 pm, Riadh <rfezzani@gmail.com> wrote:
Le 21/07/2021 à 10:55, Nelle Varoquaux a écrit :
Hello,
I would be a bit cautious a bit about the "changes are not silent because the major version number is upgraded." That opens the door to doing a lot more major versions in order to "allow" for API breakage when it could be avoided.
As a user, I find that it would be nice if my code that only depends on numpy, scipy, and matplotlib that I started at the beginning of a research project with up-to-date packages also worked at submission time with up-to-date versions of those packages with minimal changes to the code :p
Cheers, N
That's not what I am calling for, we are talking here about a first major release after a years long development period.
I am absolutely not pushing for a faster release pace ;)
Riadh.
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: jni@fastmail.com
Hi Juan, On Thu, Jul 22, 2021, at 01:37, Juan Nunez-Iglesias wrote:
- 0.19 is released, no deprecations, no nothing, everything is as it always was. - over six months, we break everything we want to break on master on the road to the brave new 1.0 world. - once we have everything where we want it, we release 0.20, identical to 0.19 but with a warning to migrate. We simultaneously release 1.0b0, so that people can migrate and depend explicitly on scikit-image>=1.0b0. - after six months, we release 0.21. This is a *completely broken release* that does not import: it raises an exception that you *must* either migrate or pin to continue using scikit-image. - after an unspecified period of time, we finally release 1.0.
Unfortunately, this does not address one of our common user groups: *Scientist writes script. Scientist goes off to do something else. Scientist comes back an unspecified time later and runs script. Results are different without the code breaking.* I am not a fan of all these versioning shenanigans; it will lead to a lot of confusion and churn. Renaming the import is painful, but it's painful once and then it's over. It is simple and can be explained to any user. And we can do it right now, instead of juggling three or four different versions with strange characteristics. Stéfan
Hi, It seems to me the arguments _for_ skimage2 are pretty good - it avoids any API breakage for current code, Hinsen or otherwise, with the tradeoff of having to use a somewhat ugly package import name. Greg referred to the arguments against in the SKIP - which are: """ Ultimately, the core developers felt that this approach could unnecessarily fragment the community, between those that continue using 0.19 and those that shift to 1.0. Ultimately, the transition of downstream code to 1.0 would be equally painful as the proposed approach, but the pressure to make the switch would be decreased, as everyone installing ``scikit-image`` would still get the old version. """ That second paragraph worries me, because it seems to imply that you are contemplating an option that will cause a lot of code breakage, Hinsen and otherwise, specifically in order to force people to upgrade to the new API. Surely that will cause a serious breach of trust with your users? I bet they expect you to take their concerns very seriously when doing big shifts like this, whereas this looks as if you are putting heavy weight on the interests of the developers against the interests of the users. I mean, can't the users expect you to accept some reduction in speed of uptake, in order to defend them from this level of breakage? Cheers, Matthew On Thu, Jul 22, 2021 at 3:48 PM Stefan van der Walt <stefanv@berkeley.edu> wrote:
Hi Juan,
On Thu, Jul 22, 2021, at 01:37, Juan Nunez-Iglesias wrote:
- 0.19 is released, no deprecations, no nothing, everything is as it always was. - over six months, we break everything we want to break on master on the road to the brave new 1.0 world. - once we have everything where we want it, we release 0.20, identical to 0.19 but with a warning to migrate. We simultaneously release 1.0b0, so that people can migrate and depend explicitly on scikit-image>=1.0b0. - after six months, we release 0.21. This is a completely broken release that does not import: it raises an exception that you *must* either migrate or pin to continue using scikit-image. - after an unspecified period of time, we finally release 1.0.
Unfortunately, this does not address one of our common user groups:
Scientist writes script. Scientist goes off to do something else. Scientist comes back an unspecified time later and runs script. Results are different without the code breaking.
I am not a fan of all these versioning shenanigans; it will lead to a lot of confusion and churn.
Renaming the import is painful, but it's painful once and then it's over. It is simple and can be explained to any user. And we can do it right now, instead of juggling three or four different versions with strange characteristics.
Stéfan
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: matthew.brett@gmail.com
Hi, On Thu, Jul 22, 2021 at 4:39 PM Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
It seems to me the arguments _for_ skimage2 are pretty good - it avoids any API breakage for current code, Hinsen or otherwise, with the tradeoff of having to use a somewhat ugly package import name.
Greg referred to the arguments against in the SKIP - which are:
""" Ultimately, the core developers felt that this approach could unnecessarily fragment the community, between those that continue using 0.19 and those that shift to 1.0.
Ultimately, the transition of downstream code to 1.0 would be equally painful as the proposed approach, but the pressure to make the switch would be decreased, as everyone installing ``scikit-image`` would still get the old version. """
That second paragraph worries me, because it seems to imply that you are contemplating an option that will cause a lot of code breakage, Hinsen and otherwise, specifically in order to force people to upgrade to the new API. Surely that will cause a serious breach of trust with your users? I bet they expect you to take their concerns very seriously when doing big shifts like this, whereas this looks as if you are putting heavy weight on the interests of the developers against the interests of the users. I mean, can't the users expect you to accept some reduction in speed of uptake, in order to defend them from this level of breakage?
To put this another way - if you do go for the breaking 1.0 strategy, I am sure that you will get lots of users on this list saying "All my code broke, and now I see it's a lot of work to fix, why did you do this?". I am sure you don't want to find yourselves in the position of having to say "We did it to force you to move onto our new API". Cheers, Matthew
For example, matplotlib 1.0 was released in July 2010, 2.0 in January 2017, then 3.0 in Sept 2018
In defense of Matplotlib, we did the 1.0 -> 2.0 change when we changed the default style, which is a poster child for "silent significant changes"! and the 2.0 -> 3.0 transition when we dropped Python 2 support (but had minimal other changes). I do think we should be bumping major versions more often, but not so that we can break more things, but because platonic semver is effectively a lie and we should be more honest about the deprecations / changes we are making. I am weakly in favor of the import skimage.v0 import skimage.v1 pattern. You can put the v0 namespace in _now_ and start documenting it as the preferred way with the try..except ImportError.. pattern to allow back-compat support with all currently released versions. This is also a mechanical enough of a transformation that a futurize style translator seems plausible (I think "sorry, we want you to change all your imports...but we have a script to do it for you!" will buy a lot of user goodwill). You might even want to in a 0.x release start decorating all of the functions accessed through the top-level namespace (not the vX namespace) to suggest that the imports need to be fixed or versions pinned (this would be some slightly garnly meta-programming but I think doable with decorators + module properties). As you start working on 1.0, anything in v0 that changes turns into a wrapper that reproduces the old API (which there are going to be some users that really want to keep the old API so keeping the v0 shims in skimage and testing them helps ensure there really is a path back!). When you release v1, you freeze, the v0 API (no new features, only super critical bug fixes (~security or segfault), if functions break due to upstream changes they get deprecated and removed) and let is live ~forever (code you do not worry about is not a huge maintenance burden) and leave the top-level as-is (or add/ramp up the warnings on how to import v0 _or_ migrate to v1), and in v1.3 or something like that drop the top-level access so users always do import skimage.v1 as skimage At NSLS-II we have gone through some major API and implementation re-thinks on one of our projects, used a version of the above scheme and have been pretty happy. Having the ability to get both the "old" and the "new" in the same Python process is invaluable as it lets you operate in a mixed mode where you have _some_ code in the notebook/script which is still using the v0 API but start writing new code in the same namespace that uses the new API. This is also valuable for down-stream libraries as it could let the developers migrate gradually to the new API without the need for a "flag day" change. This scheme also side-steps the "diamond dependency" issue as both APIs will be supported! Put another way, you do not want to put a graduate student in the position of saying "I _want_ to use the new API, but I have 10k LoC of inherited code using the old API .....". I think doing it within the same top-level package is better than making a new top-level package because it is clear it is the _same_ project not a (hostile) fork or competitor. I'm with Stéfan that subtle version shenanigans will do more harm than good. A very bad idea that this would also allow is to have a helper function like `make_top_level_api('v1')` which would allow you to globally change which version is exposed ;) There is some precedent for this in libhdf5 who when they break API add a new (numbered) function and then provide c macros to let you pick a coherent set of them with un-numbered names (or mix-and-match with the numbered versions if you want). At the end of the day, this change is going to cause (some) pain to someone (everyone?). The more you can make the pain for the users scale with their benefit/buyin the better! Tom On Thu, Jul 22, 2021 at 4:37 AM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
I think Nelle is concerned that doing it once (and surviving 😉) is enough to trigger a mindset change in the core developers that breaking releases are Ok, and thus can be done more frequently. For example, matplotlib 1.0 was released in July 2010, 2.0 in January 2017, then 3.0 in Sept 2018. 😬
In private discussions with Nelle she suggested that she *also* didn’t like skimage2, and in fact would advocate for gradual but *breaking* changes — meaning, at some point, `filters.gaussian(image)` becomes an *error* without `filters.gaussian(image, preserve_range=True)`. Then in later versions we deprecate that arg. This can all happen very slowly but the key is that the silent behaviour change is *extremely noisy* somewhere in the middle. You end up with a silent behaviour change but not before everything breaks loudly for an as-yet-unspecified period of time. The key is that people pay attention to errors, but tend to ignore warnings.
Now, for some, that *final* state, that 0.19 code ends up working differently in 0.23, even though it was broken in 0.22, is enough to say, let’s go with new-name/new-import. That’s fine.
But for others, one big issue with 1.0 is that it was impossible to warn loudly enough. So I’d like to propose what I call the *chaotic good* modification to the SKIP as an option:
- 0.19 is released, no deprecations, no nothing, everything is as it always was. - over six months, we break everything we want to break on master on the road to the brave new 1.0 world. - once we have everything where we want it, we release 0.20, identical to 0.19 but with a warning to migrate. We simultaneously release 1.0b0, so that people can migrate and depend explicitly on scikit-image>=1.0b0. - after six months, we release 0.21. This is a *completely broken release* that does not import: it raises an exception that you *must* either migrate or pin to continue using scikit-image. - after an unspecified period of time, we finally release 1.0.
Releasing a broken package is definitely chaotic, but it *does* force people to pay attention, and if the error makes the fix obvious enough, people are ok with this.
To me, this really dramatically diminishes the risk that the transition will go unnoticed by most people.
The major downside relative to the new package/new import solution is still that you *do* make 10y of stack overflow answers obsolete in some form or another. And, to be honest, many people don’t install from requirements files, they install with `pip install [package that I am reading about for the first time]`.
Anyway, I think that in a perfect world, Riadh, yes, the version bump should be enough signal. But it’s not a perfect world and users have expectations. I don’t think it’s good enough from a “customer satisfaction” standpoint if we just say “sorry that you didn’t read the fine print on the label and are now missing an eyeball” — we should indeed try to put more measures in place than a small label to prevent people from poking their eyeballs out. 😜
As a real-world example, a bunch of people in Australia got burnt by their thermomixes when they sloshed too much hot liquid around, even though the instructions say don’t use turbo mode on hot liquids. Thermomix ended up shipping TM5s (new model with more safety features) to everyone who purchased a TM31 in the last x months of its life. (Including yours truly) Moral of the story: let’s try from the beginning not to burn our users even if they didn’t read the label. 😉
Although, the counterpoint: it is super annoying that the TM5 holds the lid closed for 8 seconds now before letting you open it. 😂
Juan.
On 21 Jul 2021, at 7:06 pm, Riadh <rfezzani@gmail.com> wrote:
Le 21/07/2021 à 10:55, Nelle Varoquaux a écrit :
Hello,
I would be a bit cautious a bit about the "changes are not silent because the major version number is upgraded." That opens the door to doing a lot more major versions in order to "allow" for API breakage when it could be avoided.
As a user, I find that it would be nice if my code that only depends on numpy, scipy, and matplotlib that I started at the beginning of a research project with up-to-date packages also worked at submission time with up-to-date versions of those packages with minimal changes to the code :p
Cheers, N
That's not what I am calling for, we are talking here about a first major release after a years long development period.
I am absolutely not pushing for a faster release pace ;)
Riadh.
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: jni@fastmail.com
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: tcaswell@gmail.com
-- Thomas Caswell tcaswell@gmail.com
While we are circling this issue, it is worth noting that the substantial Hinsen rule breaking (or abandoning the original package name) changes being proposed are motivated due to a comparatively very tiny minority of users. And I'm saying this from that minority, as I work with CT data where we care about Hounsfield units. The silent majority has been quite happy with "Anything in, anything out" for the lifetime of the package. Those - like myself - who need range preservation learn to use the helpful flags which are now available, or pre-convert our data to float with a known scaling so it can be recovered. It's also worth considering that there is a substantial corpus of scikit-image teaching material out there. The majority we do not control, so cannot be updated or edited. The first hits on YouTube for tutorials are not the most recent, but older ones with lots of views. In virtually all cases, we tell users "anything in, anything out" and they will continue to hear and read this regardless of how strongly we might attempt to message around the change. As a result I expect the ongoing support burden will actually be worse with this change, than the low level support burden we've seen for years due to people not understanding datatype conversions. So, I'll directly ask the question we're dancing around: *Is it worth making preserve_range the default?* As someone who would benefit from there changes, I am honestly no longer convinced it is. The workarounds for this problem are trivial from my standpoint as a user who does actually care about my data range, whereas the consequences of changing it at the package level are substantial and insidious - if not outright dangerous. Josh On Wed, Jul 21, 2021, 02:17 Riadh <rfezzani@gmail.com> wrote:
Hi, Le 21/07/2021 à 03:27, Juan Nunez-Iglesias a écrit :
Riadh thinks, correct me if I’m wrong Riadh, that breaking it for 1.0 is ok, *especially* given the 0.20 warning. To be honest, one thing I like about the 0.20 warning is that it will *teach* people to pay attention to version numbers. The other plans, not so much. And these are important not just in this skimage transition but throughout the ecosystem. The Hinsen rule is broken dozens of times across the ecosystem. Even NumPy allows this over “long” deprecation periods. (See the copy=’never’ discussion.) But, back to summaries.
That's in fact my opinion :)
@Stefan, the copy='never' discussion is may be a bad example, but Hinsen himself cites <https://hal.archives-ouvertes.fr/hal-02117588/document> Numpy as a reason for his code to collapse: "Today’s contributors and maintainers of the scientific Python infrastructure come from backgrounds with a much faster time scale of change. For them, NumPy is probably a sufficiently stable infrastructure, whereas for me it isn’t..."
Scikit-image depends on Numpy/Scipy that are already Hinsen-rule-breakers.
Back to our discussion, changing the package name or the import name will impact all our users, while only few of them are concerned with the Hinsen rule. Let's guide those ones to use virtual envs and version pinning that will definitely help them in developing reproducible research. Riadh. _______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: silvertrumpet999@gmail.com
On Thu, Jul 22, 2021, at 15:12, Josh Warner wrote:
It's also worth considering that there is a substantial corpus of scikit-image teaching material out there. The majority we do not control, so cannot be updated or edited. The first hits on YouTube for tutorials are not the most recent, but older ones with lots of views. In virtually all cases, we tell users "anything in, anything out" and they will continue to hear and read this regardless of how strongly we might attempt to message around the change. As a result I expect the ongoing support burden will actually be worse with this change, than the low level support burden we've seen for years due to people not understanding datatype conversions.
So, I'll directly ask the question we're dancing around: *Is it worth making preserve_range the default?* As someone who would benefit from there changes, I am honestly no longer convinced it is. The workarounds for this problem are trivial from my standpoint as a user who does actually care about my data range, whereas the consequences of changing it at the package level are substantial and insidious - if not outright dangerous.
Josh makes a good point, and perhaps we should take a step back before we get too carried away. From this discussion, there are just about as many opinions on how to do a transition as there are participants. We do not have consensus. I think one reason is that we have not taken the time to carefully write up the various categories of users, their needs, and how this would impact them. We need to compare code snippets, to see how APIs would look before and after. But, before we go there, Josh's comment really made me wonder: why are we so convinced that the current model is inherently flawed? Imagine, for example, we simplified the existing input model and said: input is always floats ranged [0, 1], output is always floats ranged [0, 1]. In a very few select cases it will have memory implications (CLAHE). And, yes, it's a tad annoying for people with, say, temperature data. But not much: scale = image.max() image = image / scale out skimage.some.func(image) out = out * scale We can then drop `preserve_range`. The "[0, 1] float in [0, 1] float out" data model is *trivial* to explain, and there are no surprises. If we error on other input, it will break some older scripts, but we can be descriptive. Compare this to some of the more extreme changes we've discussed so far. Writing utility functions to make common tasks easier is a lot more straightforward than forcing everyone to upgrade their scripts. Now, sure, there is a philosophical question too: should we forever be beholder to API decisions of the past? Don't we want a mechanism to move into a different direction eventually? Perhaps, but then I would argue as in the first paragraph: that we need to study carefully exactly who the users are we have in mind, and what their needs are. We may even want to do a survey to see how prevalent their needs are. I.e., we would need a much more detailed SKIP jointly written by the developers / community. So, let's take a careful look at Josh's suggestion and ask ourselves whether it is absolutely impossible to find a way out of this without silent & implicit API breakage. Best regards, Stéfan
I would caution about restricting to the [0, 1] range in the functions. Internal to imshow in Matplotilb we are currently doing this rescaling (because we are using Agg to do some resampling for us) and it has caused a fair amount of trouble, particularly in images with large dynamic range. Tom On Fri, Jul 23, 2021 at 4:32 PM Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Thu, Jul 22, 2021, at 15:12, Josh Warner wrote:
It's also worth considering that there is a substantial corpus of scikit-image teaching material out there. The majority we do not control, so cannot be updated or edited. The first hits on YouTube for tutorials are not the most recent, but older ones with lots of views. In virtually all cases, we tell users "anything in, anything out" and they will continue to hear and read this regardless of how strongly we might attempt to message around the change. As a result I expect the ongoing support burden will actually be worse with this change, than the low level support burden we've seen for years due to people not understanding datatype conversions.
So, I'll directly ask the question we're dancing around: *Is it worth making preserve_range the default?* As someone who would benefit from there changes, I am honestly no longer convinced it is. The workarounds for this problem are trivial from my standpoint as a user who does actually care about my data range, whereas the consequences of changing it at the package level are substantial and insidious - if not outright dangerous.
Josh makes a good point, and perhaps we should take a step back before we get too carried away.
From this discussion, there are just about as many opinions on how to do a transition as there are participants. We do not have consensus. I think one reason is that we have not taken the time to carefully write up the various categories of users, their needs, and how this would impact them. We need to compare code snippets, to see how APIs would look before and after.
But, before we go there, Josh's comment really made me wonder: why are we so convinced that the current model is inherently flawed?
Imagine, for example, we simplified the existing input model and said: input is always floats ranged [0, 1], output is always floats ranged [0, 1].
In a very few select cases it will have memory implications (CLAHE). And, yes, it's a tad annoying for people with, say, temperature data. But not much:
scale = image.max() image = image / scale
out skimage.some.func(image)
out = out * scale
We can then drop `preserve_range`. The "[0, 1] float in [0, 1] float out" data model is *trivial* to explain, and there are no surprises. If we error on other input, it will break some older scripts, but we can be descriptive. Compare this to some of the more extreme changes we've discussed so far. Writing utility functions to make common tasks easier is a lot more straightforward than forcing everyone to upgrade their scripts.
Now, sure, there is a philosophical question too: should we forever be beholder to API decisions of the past? Don't we want a mechanism to move into a different direction eventually? Perhaps, but then I would argue as in the first paragraph: that we need to study carefully exactly who the users are we have in mind, and what their needs are. We may even want to do a survey to see how prevalent their needs are. I.e., we would need a much more detailed SKIP jointly written by the developers / community.
So, let's take a careful look at Josh's suggestion and ask ourselves whether it is absolutely impossible to find a way out of this without silent & implicit API breakage.
Best regards, Stéfan
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: tcaswell@gmail.com
-- Thomas Caswell tcaswell@gmail.com
On Fri, Jul 23, 2021, at 14:08, Thomas Caswell wrote:
I would caution about restricting to the [0, 1] range in the functions. Internal to imshow in Matplotilb we are currently doing this rescaling (because we are using Agg to do some resampling for us) and it has caused a fair amount of trouble, particularly in images with large dynamic range.
This is the existing arrangement, by the way; we've always assumed and used this inside essentially all algorithms. You'd need significant dynamic range before floating point distance becomes an issue, I imagine? Stéfan
Hi, On Fri, Jul 23, 2021 at 11:15 PM Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Fri, Jul 23, 2021, at 14:08, Thomas Caswell wrote:
I would caution about restricting to the [0, 1] range in the functions. Internal to imshow in Matplotilb we are currently doing this rescaling (because we are using Agg to do some resampling for us) and it has caused a fair amount of trouble, particularly in images with large dynamic range.
This is the existing arrangement, by the way; we've always assumed and used this inside essentially all algorithms.
You'd need significant dynamic range before floating point distance becomes an issue, I imagine?
Forgive my lack of experience here (I mostly work with neuroimaging images) but does the problem of large dynamic range also argue in favor of asking the user to handle and review the initial conversion to floating point, perhaps with some thoughtful helper routines, rather than doing that automatically inside the called function? Cheers, Matthew
Hi Tom, On Fri, Jul 23, 2021, at 14:08, Thomas Caswell wrote:
I would caution about restricting to the [0, 1] range in the functions. Internal to imshow in Matplotilb we are currently doing this rescaling (because we are using Agg to do some resampling for us) and it has caused a fair amount of trouble, particularly in images with large dynamic range.
I was trying to come up with a case where this would cause problems. Do you happen to have examples? Best regards, Stéfan
See around https://github.com/matplotlib/matplotlib/blob/88f53b12e1443a9ae046ee55d1f1d6... and https://github.com/matplotlib/matplotlib/pull/17636, https://github.com/matplotlib/matplotlib/pull/10613, https://github.com/matplotlib/matplotlib/pull/10133 Where the issues tend to show up is if you have enough dynamic range that the small end is less than difference between adjacent representable numbers at the high end e.g. In [5]: 1e16 == (1e16 + 1) Out[5]: True In some cases the scaling / unscaling does not work out the way you wish it would. While it is possible that the issues we are having are related to what we are doing with the results, forcing to [0, 1] restricts you to ~15 orders of magnitude on the whole image which seems not ideal. While it may not be common, that Matplotlib got those bug reports says we do have users with such extreme dynamic range in the community! That said, if you see any obvious things Matplotlib is doing wrong please let us know! Tom On Fri, Jul 23, 2021 at 7:42 PM Stefan van der Walt <stefanv@berkeley.edu> wrote:
Hi Tom,
On Fri, Jul 23, 2021, at 14:08, Thomas Caswell wrote:
I would caution about restricting to the [0, 1] range in the functions. Internal to imshow in Matplotilb we are currently doing this rescaling (because we are using Agg to do some resampling for us) and it has caused a fair amount of trouble, particularly in images with large dynamic range.
I was trying to come up with a case where this would cause problems. Do you happen to have examples?
Best regards, Stéfan
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: tcaswell@gmail.com
-- Thomas Caswell tcaswell@gmail.com
Hi Tom, On Fri, Jul 23, 2021, at 17:57, Thomas Caswell wrote:
See around https://github.com/matplotlib/matplotlib/blob/88f53b12e1443a9ae046ee55d1f1d6... and https://github.com/matplotlib/matplotlib/pull/17636, https://github.com/matplotlib/matplotlib/pull/10613, https://github.com/matplotlib/matplotlib/pull/10133
Where the issues tend to show up is if you have enough dynamic range that the small end is less than difference between adjacent representable numbers at the high end e.g.
In [5]: 1e16 == (1e16 + 1) Out[5]: True
This issue would crop up if you had, e.g., uint64 images utilizing the full range. We don't support uint64 images, and uint32 is OK still on this front if you use `float64` for calculations.
In some cases the scaling / unscaling does not work out the way you wish it would. While it is possible that the issues we are having are related to what we are doing with the results, forcing to [0, 1] restricts you to ~15 orders of magnitude on the whole image which seems not ideal. While it may not be common, that Matplotlib got those bug reports says we do have users with such extreme dynamic range in the community!
15 orders of magnitude is enormous! Note that all our floating point operations internally currently happen with float64 anyway—and this is pretty much the best you can do here. The other issue you mention is due to interpolation that sometimes goes outside the desired range; but this is an expected artifact of interpolation (which we typically have the `clip` flag for). To be clear, I'm trying to get a the underlying issues here and identify them; not to dismiss your concerns! Best regards, Stéfan
I'm very glad to hear from you, Josh 😊, but I'm 100% convinced that removing the automatic rescaling is the right path forward. Stéfan, "floats between [0, 1]" is easy enough to explain, except when it isn't (signed filters), or when we automatically rescale int32s in [0, 255] to floats in [0, 2**(-31)], or uint16s in [0, 4095] to floats in [0, 2**(-4)], etc. I can't count the number of times I've had to point users to "Image data types and what they mean". Floats in [0, 1] is certainly not simpler to explain than "we use floats internally for computation, period." Yes, there is a chance that we'll now get users confused about uint8 overflow/underflow, but at least then we can teach them about fundamental computer science principles, rather than about how skimage does things "just so". As Matthew pointed out, the user is best placed to know how to manage their data scales. When we do it automagically, we often mess up. And Stéfan, to steal from your approach, we can look to our values to guide our decision-making: "we don't do magic." Let's remove the last few places where we do. Matthew, apologies for sounding callous to users — that is absolutely not my intent! Hence this email thread. The question when aiming for a new API is how to move the community forward without fracturing it. My suggestion of "upgrade pressure" was aimed at doing this, with the implicit assumption that *limited* short term pain would result in higher long-term gain — for all our users. I'm certainly starting to be persuaded that skimage2 is indeed the best path forward, mainly so that we don't invalidate old Q&As and tutorials. We can perhaps do a combination, though: skimage 0.19 is the last "real" release with the old API skimage2 2.0 is the next real release when skimage 2.0 is release, we release skimage 0.20, which is 0.19 with a warning that scikit-image is deprecated and no longer maintained, and point to the migration guide, and if you want to keep using the deprecated API, pin to 0.19 explicitly. That probably satisfies my "migration pressure" requirement. Juan. On Fri, 23 Jul 2021, at 8:29 PM, Stefan van der Walt wrote:
Hi Tom,
On Fri, Jul 23, 2021, at 17:57, Thomas Caswell wrote:
See around https://github.com/matplotlib/matplotlib/blob/88f53b12e1443a9ae046ee55d1f1d6... and https://github.com/matplotlib/matplotlib/pull/17636, https://github.com/matplotlib/matplotlib/pull/10613, https://github.com/matplotlib/matplotlib/pull/10133
Where the issues tend to show up is if you have enough dynamic range that the small end is less than difference between adjacent representable numbers at the high end e.g.
In [5]: 1e16 == (1e16 + 1) Out[5]: True
This issue would crop up if you had, e.g., uint64 images utilizing the full range. We don't support uint64 images, and uint32 is OK still on this front if you use `float64` for calculations.
In some cases the scaling / unscaling does not work out the way you wish it would. While it is possible that the issues we are having are related to what we are doing with the results, forcing to [0, 1] restricts you to ~15 orders of magnitude on the whole image which seems not ideal. While it may not be common, that Matplotlib got those bug reports says we do have users with such extreme dynamic range in the community!
15 orders of magnitude is enormous! Note that all our floating point operations internally currently happen with float64 anyway—and this is pretty much the best you can do here.
The other issue you mention is due to interpolation that sometimes goes outside the desired range; but this is an expected artifact of interpolation (which we typically have the `clip` flag for).
To be clear, I'm trying to get a the underlying issues here and identify them; not to dismiss your concerns!
Best regards, Stéfan _______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: jni@fastmail.com
On Sat, Jul 24, 2021, at 17:58, Juan Nunez-Iglesias wrote:
I'm very glad to hear from you, Josh 😊, but I'm 100% convinced that removing the automatic rescaling is the right path forward. Stéfan, "floats between [0, 1]" is easy enough to explain, except when it isn't (signed filters), or when we automatically rescale int32s in [0, 255] to floats in [0, 2**(-31)], or uint16s in [0, 4095] to floats in [0, 2**(-4)], etc.
That's why I proposed not automatically scaling integer arrays, but erroring instead. I also don't understand what you mean by "excepted when it isn't (signed filters)". Can you motivate more carefully why our current approach is problematic and insufficient in some cases? Stéfan
On 25 Jul 2021, at 2:59 pm, Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Sat, Jul 24, 2021, at 17:58, Juan Nunez-Iglesias wrote:
I'm very glad to hear from you, Josh 😊, but I'm 100% convinced that removing the automatic rescaling is the right path forward. Stéfan, "floats between [0, 1]" is easy enough to explain, except when it isn't (signed filters), or when we automatically rescale int32s in [0, 255] to floats in [0, 2**(-31)], or uint16s in [0, 4095] to floats in [0, 2**(-4)], etc.
That's why I proposed not automatically scaling integer arrays, but erroring instead.
I also don't understand what you mean by "excepted when it isn't (signed filters)".
If a linear filter contains negative values, the output will not be in [0, 1] but in [-1, 1].
Can you motivate more carefully why our current approach is problematic and insufficient in some cases?
Our current approach is to automatically rescale. You say fine, let’s make it an error when input isn’t float in [0, 1], and force people to rescale. That’s certainly better than the current situation, but it’s still not ideal — in many many cases, `.astype()` will do the job. Let’s aim for the ideal situation. This isn’t the only silent change we are proposing, just the biggest. ie even if we do keep our [0, 1] limitation, we still need to change coordinate order in some APIs, change the return type of regionprops, rename a bunch of parameters, move a bunch of functions around. Juan.
On Sun, Jul 25, 2021, at 01:09, Juan Nunez-Iglesias wrote:
Let’s aim for the ideal situation. This isn’t the only silent change we are proposing, just the biggest. ie even if we do keep our [0, 1] limitation, we still need to change coordinate order in some APIs, change the return type of regionprops, rename a bunch of parameters, move a bunch of functions around.
The ideal is that: an ideal. But we've now seen that there is no consensus on how to get there, because whichever route we go it gets messy. And the advantage of not doing the breakage is significant. I think we should still talk about the skimage2 option, but we should investigate each break, consider workarounds, and if those cannot be found justify the change with examples. Stéfan
I'll be brief as my internet is currently down, replying from mobile. Of these examples and similar, I would characterize them in a couple categories 1. Data range user errors - the user used (almost always an overly large) type for their actual data and they end up with an image which looks all black/gray/etc. 2. Signed data of course needs to include the symmetric range [-1, 1] as a generalization of unsigned workflow, which happens naturally since float64 is signed. 3. Overshoots/undershoots due to expected computational effects, as mentioned elsewhere in this thread; user may or may not want to retain these, and are uncommon. These do represent a low level support burden - but since the story is predictable, presents the opportunity to guide users toward a FAQ or similar before filling a new Issue. That would certainly be less disruptive than the solutions proposed! I would assert anyone working in this space NEEDS to understand their data and its representation or they will have serious problems. It is so foundational that insulating them from the concept doesn't do them favors. That said the workings and logic of dtype.py are somewhat opaque. Could a featured, direct, high-yield document informing users about our conversion behavior and a FAQ serve users just as well as the heroic efforts suggested? Josh On Sat, Jul 24, 2021, 19:59 Juan Nunez-Iglesias <jni@fastmail.com> wrote:
I'm very glad to hear from you, Josh 😊, but I'm 100% convinced that removing the automatic rescaling is the right path forward. Stéfan, "floats between [0, 1]" is easy enough to explain, except when it isn't (signed filters), or when we automatically rescale int32s in [0, 255] to floats in [0, 2**(-31)], or uint16s in [0, 4095] to floats in [0, 2**(-4)], etc. I can't count the number of times I've had to point users to "Image data types and what they mean". Floats in [0, 1] is certainly not simpler to explain than "we use floats internally for computation, period." Yes, there is a chance that we'll now get users confused about uint8 overflow/underflow, but at least then we can teach them about fundamental computer science principles, rather than about how skimage does things "just so".
As Matthew pointed out, the user is best placed to know how to manage their data scales. When we do it automagically, we often mess up. And Stéfan, to steal from your approach, we can look to our values to guide our decision-making: "we don't do magic." Let's remove the last few places where we do.
Matthew, apologies for sounding callous to users — that is absolutely not my intent! Hence this email thread. The question when aiming for a new API is how to move the community forward without fracturing it. My suggestion of "upgrade pressure" was aimed at doing this, with the implicit assumption that *limited* short term pain would result in higher long-term gain — for all our users.
I'm certainly starting to be persuaded that skimage2 is indeed the best path forward, mainly so that we don't invalidate old Q&As and tutorials. We can perhaps do a combination, though:
skimage 0.19 is the last "real" release with the old API skimage2 2.0 is the next real release when skimage 2.0 is release, we release skimage 0.20, which is 0.19 with a warning that scikit-image is deprecated and no longer maintained, and point to the migration guide, and if you want to keep using the deprecated API, pin to 0.19 explicitly.
That probably satisfies my "migration pressure" requirement.
Juan.
On Fri, 23 Jul 2021, at 8:29 PM, Stefan van der Walt wrote:
Hi Tom,
On Fri, Jul 23, 2021, at 17:57, Thomas Caswell wrote:
See around https://github.com/matplotlib/matplotlib/blob/88f53b12e1443a9ae046ee55d1f1d6... and https://github.com/matplotlib/matplotlib/pull/17636, https://github.com/matplotlib/matplotlib/pull/10613, https://github.com/matplotlib/matplotlib/pull/10133
Where the issues tend to show up is if you have enough dynamic range that the small end is less than difference between adjacent representable numbers at the high end e.g.
In [5]: 1e16 == (1e16 + 1) Out[5]: True
This issue would crop up if you had, e.g., uint64 images utilizing the full range. We don't support uint64 images, and uint32 is OK still on this front if you use `float64` for calculations.
In some cases the scaling / unscaling does not work out the way you wish it would. While it is possible that the issues we are having are related to what we are doing with the results, forcing to [0, 1] restricts you to ~15 orders of magnitude on the whole image which seems not ideal. While it may not be common, that Matplotlib got those bug reports says we do have users with such extreme dynamic range in the community!
15 orders of magnitude is enormous! Note that all our floating point operations internally currently happen with float64 anyway—and this is pretty much the best you can do here.
The other issue you mention is due to interpolation that sometimes goes outside the desired range; but this is an expected artifact of interpolation (which we typically have the `clip` flag for).
To be clear, I'm trying to get a the underlying issues here and identify them; not to dismiss your concerns!
Best regards, Stéfan _______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: jni@fastmail.com
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: silvertrumpet999@gmail.com
Hi all, as a scientific image user I have been reading along this difficult thread. Let me first pay my respect that these difficult and, by nature, opinionated (which is good!) discussions are being performed in such a civil manner! As someone who is member in a technical committee for Python software myself, I know how hard this can be.. Now to the issue at hand, I was wondering if this could be tackled as it's done in Space/Tech engineering, with a requirements documents that all should agree on, from which maybe the one and only obvious solution will emerge? I wanted to mention my personal requirements for working with an image library. Please forgive me if all of this already happens in skimage, but it's been a while since I was using it: First, and for me the most important: Pixel values are sacred and shall never be changed without letting the user know. I'm almost sure that this is the case with skimage now, but in the early days I remember I was highly surprised, annoyed even, when some routine simply insisted that the input data needs to be so and so and the result will be this format, no matter what came in. It simply resulted in being less useful for me (no complaint, I know I could have done some PRs ;) ). I will admit that us instrumentalists are completely ignorant of certain standards in proper image formats, we simply use them as co-located data containers. This means they can be ANY format: * Integers (both signed and unsigned) - with counts as high as the digitized signal required for determining the dynamic range, sometime negative because some weird amplifier randomly would suck off electrons, who knows what the engineers are cooking with ... ;) * Floats, often representing physical values after the integer format version was calibrated, but with absolutely no sensible/reasonable way to force them into some kind of range. The pixel values represent physics values, they don't care that a float image shouldn't be larger than 1.0 but the fact is, ALL of these pixel values are measurements with a meaning and they absolutely need to be preserved. This statement needs to be qualified though with "within reason", as obviously some "wanted" operation like a median filter to remove noise will change pixel values, but is indeed range preserving and the meaning of the data isn't lost. I understand that certain algorithms require the incoming image to be in a certain format and range, and if no "standard" wrapper can be identified that could transform and back-transform into the same range, then the user should be pointed to workarounds, but not left alone simply with the error message that the format doesn't match the algorithm. I for myself am lucky that I do not have a lot of code that I would need to change, so I wouldn't really mind any import name changes, so I think it might be much more of an discussion for the maintainers which way minimize a convolution of maintainer_effort with user_pain, but honesty, knowing how hard it is to find extra time for a passion volunteer-effort project, I'd almost always go for "least effort", because I think the community will come around (as can be seen with cv2 and other examples). I just wanted to emphasize how important the pixel values can be for us, as they literally represent the bearer of the truth from outer space, so to speak, and any change of their values shall be done only under full consideration of the consequences. My 2 opinionated cents. As always, thanks so much for everybody's effort for this project, we soon will have a technical-committee-reviewed package of many of my planetary science tools for data retrieval and data reading coming out, so I'm kinda feeling now how much damned work it is to design tools for the "community"... Best regards, Michael On Sun, Jul 25, 2021 at 11:40 AM Josh Warner <silvertrumpet999@gmail.com> wrote:
I'll be brief as my internet is currently down, replying from mobile.
Of these examples and similar, I would characterize them in a couple categories 1. Data range user errors - the user used (almost always an overly large) type for their actual data and they end up with an image which looks all black/gray/etc. 2. Signed data of course needs to include the symmetric range [-1, 1] as a generalization of unsigned workflow, which happens naturally since float64 is signed. 3. Overshoots/undershoots due to expected computational effects, as mentioned elsewhere in this thread; user may or may not want to retain these, and are uncommon.
These do represent a low level support burden - but since the story is predictable, presents the opportunity to guide users toward a FAQ or similar before filling a new Issue. That would certainly be less disruptive than the solutions proposed!
I would assert anyone working in this space NEEDS to understand their data and its representation or they will have serious problems. It is so foundational that insulating them from the concept doesn't do them favors.
That said the workings and logic of dtype.py are somewhat opaque. Could a featured, direct, high-yield document informing users about our conversion behavior and a FAQ serve users just as well as the heroic efforts suggested?
Josh
On Sat, Jul 24, 2021, 19:59 Juan Nunez-Iglesias <jni@fastmail.com> wrote:
I'm very glad to hear from you, Josh 😊, but I'm 100% convinced that removing the automatic rescaling is the right path forward. Stéfan, "floats between [0, 1]" is easy enough to explain, except when it isn't (signed filters), or when we automatically rescale int32s in [0, 255] to floats in [0, 2**(-31)], or uint16s in [0, 4095] to floats in [0, 2**(-4)], etc. I can't count the number of times I've had to point users to "Image data types and what they mean". Floats in [0, 1] is certainly not simpler to explain than "we use floats internally for computation, period." Yes, there is a chance that we'll now get users confused about uint8 overflow/underflow, but at least then we can teach them about fundamental computer science principles, rather than about how skimage does things "just so".
As Matthew pointed out, the user is best placed to know how to manage their data scales. When we do it automagically, we often mess up. And Stéfan, to steal from your approach, we can look to our values to guide our decision-making: "we don't do magic." Let's remove the last few places where we do.
Matthew, apologies for sounding callous to users — that is absolutely not my intent! Hence this email thread. The question when aiming for a new API is how to move the community forward without fracturing it. My suggestion of "upgrade pressure" was aimed at doing this, with the implicit assumption that *limited* short term pain would result in higher long-term gain — for all our users.
I'm certainly starting to be persuaded that skimage2 is indeed the best path forward, mainly so that we don't invalidate old Q&As and tutorials. We can perhaps do a combination, though:
skimage 0.19 is the last "real" release with the old API skimage2 2.0 is the next real release when skimage 2.0 is release, we release skimage 0.20, which is 0.19 with a warning that scikit-image is deprecated and no longer maintained, and point to the migration guide, and if you want to keep using the deprecated API, pin to 0.19 explicitly.
That probably satisfies my "migration pressure" requirement.
Juan.
On Fri, 23 Jul 2021, at 8:29 PM, Stefan van der Walt wrote:
Hi Tom,
On Fri, Jul 23, 2021, at 17:57, Thomas Caswell wrote:
See around https://github.com/matplotlib/matplotlib/blob/88f53b12e1443a9ae046ee55d1f1d6... and https://github.com/matplotlib/matplotlib/pull/17636, https://github.com/matplotlib/matplotlib/pull/10613, https://github.com/matplotlib/matplotlib/pull/10133
Where the issues tend to show up is if you have enough dynamic range that the small end is less than difference between adjacent representable numbers at the high end e.g.
In [5]: 1e16 == (1e16 + 1) Out[5]: True
This issue would crop up if you had, e.g., uint64 images utilizing the full range. We don't support uint64 images, and uint32 is OK still on this front if you use `float64` for calculations.
In some cases the scaling / unscaling does not work out the way you wish it would. While it is possible that the issues we are having are related to what we are doing with the results, forcing to [0, 1] restricts you to ~15 orders of magnitude on the whole image which seems not ideal. While it may not be common, that Matplotlib got those bug reports says we do have users with such extreme dynamic range in the community!
15 orders of magnitude is enormous! Note that all our floating point operations internally currently happen with float64 anyway—and this is pretty much the best you can do here.
The other issue you mention is due to interpolation that sometimes goes outside the desired range; but this is an expected artifact of interpolation (which we typically have the `clip` flag for).
To be clear, I'm trying to get a the underlying issues here and identify them; not to dismiss your concerns!
Best regards, Stéfan _______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: jni@fastmail.com
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: silvertrumpet999@gmail.com
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: kmichael.aye@gmail.com
Hey, on cue! Anyone care to answer this? https://stackoverflow.com/questions/68487902/why-does-the-variance-of-laplac... <https://stackoverflow.com/questions/68487902/why-does-the-variance-of-laplace-very-different-for-opencv-and-scikit-image> ;) Thanks Michael for chiming in. User (even long-ago user) feedback is *the most valuable* in this situation, as we maintainers can become quite detached from “majority” “real-world” use cases. =) Stéfan, my point is that the rescaling is only one of several issues, all of which require similar acrobatics to achieve. We have limited developer time to do full deprecation cycles, so making all of them together *and in one go* (rather than a 2-4 version deprecation dance) is a preferable approach. Regarding consensus on *how* to make a clean break, I think that you are correct that my maybe-too-clever-and-untested force-everyone-to-pin approach is dead in the water. But I also think that you understate the amount of consensus on the skimage2 approach: most people seem pretty much on board with it including early detractors like Alex, *and* it has successful models in the community (eg. bs4 and cv2). I also think that “raise an error on anything other than floats in 0-1” is an approach that will annoy many and benefit few. In other words, in my opinion: not rescaling but accepting all dtypes has usability benefits, in addition to hopefully reducing maintainer load, *but* raising errors on all inputs other than floats in [0, 1] will also presumably reduce maintainer load in the long term, but at the cost of (probably significant) user annoyance. Josh, we do have such a documentation page: https://scikit-image.org/docs/dev/user_guide/data_types.html <https://scikit-image.org/docs/dev/user_guide/data_types.html> Unfortunately, it is not trivial to discover. Even with a big fat link on the front page, I suspect most users won’t find it before asking for help, because navigating documentation is hard. FAQs/documentation links are very good and necessary when complexity is *unavoidable*, but when it is avoidable, they are a bandaid. Again, I’d prefer to point people to documentation explaining fundamentals rather than “this is just the skimage way.” My proposal going forward is to reject SKIP-3 and create a SKIP-4 proposing the skimage2 package. Juan. PS: Tom, I know you expressed preference for skimage.v0/skimage.v1, but the main advantage you stated there (depending on both, migrating gradually) is also present with skimage2.
On 26 Jul 2021, at 8:30 am, K.-Michael Aye <kmichael.aye@gmail.com> wrote:
Hi all,
as a scientific image user I have been reading along this difficult thread. Let me first pay my respect that these difficult and, by nature, opinionated (which is good!) discussions are being performed in such a civil manner! As someone who is member in a technical committee for Python software myself, I know how hard this can be..
Now to the issue at hand, I was wondering if this could be tackled as it's done in Space/Tech engineering, with a requirements documents that all should agree on, from which maybe the one and only obvious solution will emerge?
I wanted to mention my personal requirements for working with an image library. Please forgive me if all of this already happens in skimage, but it's been a while since I was using it:
First, and for me the most important: Pixel values are sacred and shall never be changed without letting the user know.
I'm almost sure that this is the case with skimage now, but in the early days I remember I was highly surprised, annoyed even, when some routine simply insisted that the input data needs to be so and so and the result will be this format, no matter what came in. It simply resulted in being less useful for me (no complaint, I know I could have done some PRs ;) ).
I will admit that us instrumentalists are completely ignorant of certain standards in proper image formats, we simply use them as co-located data containers. This means they can be ANY format: * Integers (both signed and unsigned) - with counts as high as the digitized signal required for determining the dynamic range, sometime negative because some weird amplifier randomly would suck off electrons, who knows what the engineers are cooking with ... ;) * Floats, often representing physical values after the integer format version was calibrated, but with absolutely no sensible/reasonable way to force them into some kind of range. The pixel values represent physics values, they don't care that a float image shouldn't be larger than 1.0
but the fact is, ALL of these pixel values are measurements with a meaning and they absolutely need to be preserved. This statement needs to be qualified though with "within reason", as obviously some "wanted" operation like a median filter to remove noise will change pixel values, but is indeed range preserving and the meaning of the data isn't lost.
I understand that certain algorithms require the incoming image to be in a certain format and range, and if no "standard" wrapper can be identified that could transform and back-transform into the same range, then the user should be pointed to workarounds, but not left alone simply with the error message that the format doesn't match the algorithm.
I for myself am lucky that I do not have a lot of code that I would need to change, so I wouldn't really mind any import name changes, so I think it might be much more of an discussion for the maintainers which way minimize a convolution of maintainer_effort with user_pain, but honesty, knowing how hard it is to find extra time for a passion volunteer-effort project, I'd almost always go for "least effort", because I think the community will come around (as can be seen with cv2 and other examples).
I just wanted to emphasize how important the pixel values can be for us, as they literally represent the bearer of the truth from outer space, so to speak, and any change of their values shall be done only under full consideration of the consequences.
My 2 opinionated cents. As always, thanks so much for everybody's effort for this project, we soon will have a technical-committee-reviewed package of many of my planetary science tools for data retrieval and data reading coming out, so I'm kinda feeling now how much damned work it is to design tools for the "community"...
Best regards, Michael
On Sun, Jul 25, 2021 at 11:40 AM Josh Warner <silvertrumpet999@gmail.com <mailto:silvertrumpet999@gmail.com>> wrote: I'll be brief as my internet is currently down, replying from mobile.
Of these examples and similar, I would characterize them in a couple categories 1. Data range user errors - the user used (almost always an overly large) type for their actual data and they end up with an image which looks all black/gray/etc. 2. Signed data of course needs to include the symmetric range [-1, 1] as a generalization of unsigned workflow, which happens naturally since float64 is signed. 3. Overshoots/undershoots due to expected computational effects, as mentioned elsewhere in this thread; user may or may not want to retain these, and are uncommon.
These do represent a low level support burden - but since the story is predictable, presents the opportunity to guide users toward a FAQ or similar before filling a new Issue. That would certainly be less disruptive than the solutions proposed!
I would assert anyone working in this space NEEDS to understand their data and its representation or they will have serious problems. It is so foundational that insulating them from the concept doesn't do them favors.
That said the workings and logic of dtype.py are somewhat opaque. Could a featured, direct, high-yield document informing users about our conversion behavior and a FAQ serve users just as well as the heroic efforts suggested?
Josh
On Sat, Jul 24, 2021, 19:59 Juan Nunez-Iglesias <jni@fastmail.com <mailto:jni@fastmail.com>> wrote: I'm very glad to hear from you, Josh 😊, but I'm 100% convinced that removing the automatic rescaling is the right path forward. Stéfan, "floats between [0, 1]" is easy enough to explain, except when it isn't (signed filters), or when we automatically rescale int32s in [0, 255] to floats in [0, 2**(-31)], or uint16s in [0, 4095] to floats in [0, 2**(-4)], etc. I can't count the number of times I've had to point users to "Image data types and what they mean". Floats in [0, 1] is certainly not simpler to explain than "we use floats internally for computation, period." Yes, there is a chance that we'll now get users confused about uint8 overflow/underflow, but at least then we can teach them about fundamental computer science principles, rather than about how skimage does things "just so".
As Matthew pointed out, the user is best placed to know how to manage their data scales. When we do it automagically, we often mess up. And Stéfan, to steal from your approach, we can look to our values to guide our decision-making: "we don't do magic." Let's remove the last few places where we do.
Matthew, apologies for sounding callous to users — that is absolutely not my intent! Hence this email thread. The question when aiming for a new API is how to move the community forward without fracturing it. My suggestion of "upgrade pressure" was aimed at doing this, with the implicit assumption that *limited* short term pain would result in higher long-term gain — for all our users.
I'm certainly starting to be persuaded that skimage2 is indeed the best path forward, mainly so that we don't invalidate old Q&As and tutorials. We can perhaps do a combination, though:
skimage 0.19 is the last "real" release with the old API skimage2 2.0 is the next real release when skimage 2.0 is release, we release skimage 0.20, which is 0.19 with a warning that scikit-image is deprecated and no longer maintained, and point to the migration guide, and if you want to keep using the deprecated API, pin to 0.19 explicitly.
That probably satisfies my "migration pressure" requirement.
Juan.
On Fri, 23 Jul 2021, at 8:29 PM, Stefan van der Walt wrote:
Hi Tom,
On Fri, Jul 23, 2021, at 17:57, Thomas Caswell wrote:
See around https://github.com/matplotlib/matplotlib/blob/88f53b12e1443a9ae046ee55d1f1d6... <https://github.com/matplotlib/matplotlib/blob/88f53b12e1443a9ae046ee55d1f1d6c3391eff22/lib/matplotlib/image.py#L392-L542>and https://github.com/matplotlib/matplotlib/pull/17636 <https://github.com/matplotlib/matplotlib/pull/17636>, https://github.com/matplotlib/matplotlib/pull/10613 <https://github.com/matplotlib/matplotlib/pull/10613>, https://github.com/matplotlib/matplotlib/pull/10133 <https://github.com/matplotlib/matplotlib/pull/10133>
Where the issues tend to show up is if you have enough dynamic range that the small end is less than difference between adjacent representable numbers at the high end e.g.
In [5]: 1e16 == (1e16 + 1) Out[5]: True
This issue would crop up if you had, e.g., uint64 images utilizing the full range. We don't support uint64 images, and uint32 is OK still on this front if you use `float64` for calculations.
In some cases the scaling / unscaling does not work out the way you wish it would. While it is possible that the issues we are having are related to what we are doing with the results, forcing to [0, 1] restricts you to ~15 orders of magnitude on the whole image which seems not ideal. While it may not be common, that Matplotlib got those bug reports says we do have users with such extreme dynamic range in the community!
15 orders of magnitude is enormous! Note that all our floating point operations internally currently happen with float64 anyway—and this is pretty much the best you can do here.
The other issue you mention is due to interpolation that sometimes goes outside the desired range; but this is an expected artifact of interpolation (which we typically have the `clip` flag for).
To be clear, I'm trying to get a the underlying issues here and identify them; not to dismiss your concerns!
Best regards, Stéfan _______________________________________________ scikit-image mailing list -- scikit-image@python.org <mailto:scikit-image@python.org> To unsubscribe send an email to scikit-image-leave@python.org <mailto:scikit-image-leave@python.org> https://mail.python.org/mailman3/lists/scikit-image.python.org/ <https://mail.python.org/mailman3/lists/scikit-image.python.org/> Member address: jni@fastmail.com <mailto:jni@fastmail.com>
_______________________________________________ scikit-image mailing list -- scikit-image@python.org <mailto:scikit-image@python.org> To unsubscribe send an email to scikit-image-leave@python.org <mailto:scikit-image-leave@python.org> https://mail.python.org/mailman3/lists/scikit-image.python.org/ <https://mail.python.org/mailman3/lists/scikit-image.python.org/> Member address: silvertrumpet999@gmail.com <mailto:silvertrumpet999@gmail.com> _______________________________________________ scikit-image mailing list -- scikit-image@python.org <mailto:scikit-image@python.org> To unsubscribe send an email to scikit-image-leave@python.org <mailto:scikit-image-leave@python.org> https://mail.python.org/mailman3/lists/scikit-image.python.org/ <https://mail.python.org/mailman3/lists/scikit-image.python.org/> Member address: kmichael.aye@gmail.com <mailto:kmichael.aye@gmail.com> _______________________________________________ scikit-image mailing list -- scikit-image@python.org <mailto:scikit-image@python.org> To unsubscribe send an email to scikit-image-leave@python.org <mailto:scikit-image-leave@python.org> https://mail.python.org/mailman3/lists/scikit-image.python.org/ <https://mail.python.org/mailman3/lists/scikit-image.python.org/> Member address: jni@fastmail.com <mailto:jni@fastmail.com>
On Mon, Jul 26, 2021 at 1:57 AM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
Hey, on cue! Anyone care to answer this?
https://stackoverflow.com/questions/68487902/why-does-the-variance-of-laplac...
;)
Thanks Michael for chiming in. User (even long-ago user) feedback is *the most valuable* in this situation, as we maintainers can become quite detached from “majority” “real-world” use cases. =)
Stéfan, my point is that the rescaling is only one of several issues, all of which require similar acrobatics to achieve. We have limited developer time to do full deprecation cycles, so making all of them together *and in one go* (rather than a 2-4 version deprecation dance) is a preferable approach.
Regarding consensus on *how* to make a clean break, I think that you are correct that my maybe-too-clever-and-untested force-everyone-to-pin approach is dead in the water. But I also think that you understate the amount of consensus on the skimage2 approach: most people seem pretty much on board with it including early detractors like Alex, *and* it has successful models in the community (eg. bs4 and cv2).
I also think that “raise an error on anything other than floats in 0-1” is an approach that will annoy many and benefit few. In other words, in my opinion: not rescaling but accepting all dtypes has usability benefits, in addition to hopefully reducing maintainer load, *but* raising errors on all inputs other than floats in [0, 1] will also presumably reduce maintainer load in the long term, but at the cost of (probably significant) user annoyance.
I am more in favor of a skimage2 (or similar) approach than the pinning approach in the SKIP, particularly as the discussion here has progressed. Regarding automatically scaling to [0, 1], I am definitely not in favor of going back to that for floating point data! We changed `img_as_float` to preserve range on float inputs quite a while ago at this point and it would be pretty annoying to users to switch it back again. I can see some argument for keeping the current scaling for the integer cases, but I still have a feeling it is likely better to not force rescaling there either. Having data unexpectedly rescaled was probably the most annoying aspect to me as a user in medical imaging applications. An additional point in favor of not rescaling is consistency with scipy.ndimage. Josh, we do have such a documentation page:
https://scikit-image.org/docs/dev/user_guide/data_types.html
Unfortunately, it is not trivial to discover. Even with a big fat link on the front page, I suspect most users won’t find it before asking for help, because navigating documentation is hard. FAQs/documentation links are very good and necessary when complexity is *unavoidable*, but when it is avoidable, they are a bandaid.
We have fortunately been able to get quite a few users to visit that page (>20k visitors in the past 6 months according to our web metrics <https://github.com/scikit-image/scikit-image/pull/5489#issuecomment-886208262>). It is the most-visited of the user guide pages, but not as visited as the installation page or some of the example and API pages. Again, I’d prefer to point people to documentation explaining fundamentals
rather than “this is just the skimage way.”
My proposal going forward is to reject SKIP-3 and create a SKIP-4 proposing the skimage2 package.
Juan.
PS: Tom, I know you expressed preference for skimage.v0/skimage.v1, but the main advantage you stated there (depending on both, migrating gradually) is also present with skimage2.
On 26 Jul 2021, at 8:30 am, K.-Michael Aye <kmichael.aye@gmail.com> wrote:
Hi all,
as a scientific image user I have been reading along this difficult thread. Let me first pay my respect that these difficult and, by nature, opinionated (which is good!) discussions are being performed in such a civil manner! As someone who is member in a technical committee for Python software myself, I know how hard this can be..
Now to the issue at hand, I was wondering if this could be tackled as it's done in Space/Tech engineering, with a requirements documents that all should agree on, from which maybe the one and only obvious solution will emerge?
I wanted to mention my personal requirements for working with an image library. Please forgive me if all of this already happens in skimage, but it's been a while since I was using it:
First, and for me the most important: Pixel values are sacred and shall never be changed without letting the user know.
I'm almost sure that this is the case with skimage now, but in the early days I remember I was highly surprised, annoyed even, when some routine simply insisted that the input data needs to be so and so and the result will be this format, no matter what came in. It simply resulted in being less useful for me (no complaint, I know I could have done some PRs ;) ).
I will admit that us instrumentalists are completely ignorant of certain standards in proper image formats, we simply use them as co-located data containers. This means they can be ANY format: * Integers (both signed and unsigned) - with counts as high as the digitized signal required for determining the dynamic range, sometime negative because some weird amplifier randomly would suck off electrons, who knows what the engineers are cooking with ... ;) * Floats, often representing physical values after the integer format version was calibrated, but with absolutely no sensible/reasonable way to force them into some kind of range. The pixel values represent physics values, they don't care that a float image shouldn't be larger than 1.0
but the fact is, ALL of these pixel values are measurements with a meaning and they absolutely need to be preserved. This statement needs to be qualified though with "within reason", as obviously some "wanted" operation like a median filter to remove noise will change pixel values, but is indeed range preserving and the meaning of the data isn't lost.
I understand that certain algorithms require the incoming image to be in a certain format and range, and if no "standard" wrapper can be identified that could transform and back-transform into the same range, then the user should be pointed to workarounds, but not left alone simply with the error message that the format doesn't match the algorithm.
I for myself am lucky that I do not have a lot of code that I would need to change, so I wouldn't really mind any import name changes, so I think it might be much more of an discussion for the maintainers which way minimize a convolution of maintainer_effort with user_pain, but honesty, knowing how hard it is to find extra time for a passion volunteer-effort project, I'd almost always go for "least effort", because I think the community will come around (as can be seen with cv2 and other examples).
I just wanted to emphasize how important the pixel values can be for us, as they literally represent the bearer of the truth from outer space, so to speak, and any change of their values shall be done only under full consideration of the consequences.
My 2 opinionated cents. As always, thanks so much for everybody's effort for this project, we soon will have a technical-committee-reviewed package of many of my planetary science tools for data retrieval and data reading coming out, so I'm kinda feeling now how much damned work it is to design tools for the "community"...
Best regards, Michael
On Sun, Jul 25, 2021 at 11:40 AM Josh Warner <silvertrumpet999@gmail.com> wrote:
I'll be brief as my internet is currently down, replying from mobile.
Of these examples and similar, I would characterize them in a couple categories 1. Data range user errors - the user used (almost always an overly large) type for their actual data and they end up with an image which looks all black/gray/etc. 2. Signed data of course needs to include the symmetric range [-1, 1] as a generalization of unsigned workflow, which happens naturally since float64 is signed. 3. Overshoots/undershoots due to expected computational effects, as mentioned elsewhere in this thread; user may or may not want to retain these, and are uncommon.
These do represent a low level support burden - but since the story is predictable, presents the opportunity to guide users toward a FAQ or similar before filling a new Issue. That would certainly be less disruptive than the solutions proposed!
I would assert anyone working in this space NEEDS to understand their data and its representation or they will have serious problems. It is so foundational that insulating them from the concept doesn't do them favors.
That said the workings and logic of dtype.py are somewhat opaque. Could a featured, direct, high-yield document informing users about our conversion behavior and a FAQ serve users just as well as the heroic efforts suggested?
Josh
On Sat, Jul 24, 2021, 19:59 Juan Nunez-Iglesias <jni@fastmail.com> wrote:
I'm very glad to hear from you, Josh 😊, but I'm 100% convinced that removing the automatic rescaling is the right path forward. Stéfan, "floats between [0, 1]" is easy enough to explain, except when it isn't (signed filters), or when we automatically rescale int32s in [0, 255] to floats in [0, 2**(-31)], or uint16s in [0, 4095] to floats in [0, 2**(-4)], etc. I can't count the number of times I've had to point users to "Image data types and what they mean". Floats in [0, 1] is certainly not simpler to explain than "we use floats internally for computation, period." Yes, there is a chance that we'll now get users confused about uint8 overflow/underflow, but at least then we can teach them about fundamental computer science principles, rather than about how skimage does things "just so".
As Matthew pointed out, the user is best placed to know how to manage their data scales. When we do it automagically, we often mess up. And Stéfan, to steal from your approach, we can look to our values to guide our decision-making: "we don't do magic." Let's remove the last few places where we do.
Matthew, apologies for sounding callous to users — that is absolutely not my intent! Hence this email thread. The question when aiming for a new API is how to move the community forward without fracturing it. My suggestion of "upgrade pressure" was aimed at doing this, with the implicit assumption that *limited* short term pain would result in higher long-term gain — for all our users.
I'm certainly starting to be persuaded that skimage2 is indeed the best path forward, mainly so that we don't invalidate old Q&As and tutorials. We can perhaps do a combination, though:
skimage 0.19 is the last "real" release with the old API skimage2 2.0 is the next real release when skimage 2.0 is release, we release skimage 0.20, which is 0.19 with a warning that scikit-image is deprecated and no longer maintained, and point to the migration guide, and if you want to keep using the deprecated API, pin to 0.19 explicitly.
That probably satisfies my "migration pressure" requirement.
Juan.
On Fri, 23 Jul 2021, at 8:29 PM, Stefan van der Walt wrote:
Hi Tom,
On Fri, Jul 23, 2021, at 17:57, Thomas Caswell wrote:
See around https://github.com/matplotlib/matplotlib/blob/88f53b12e1443a9ae046ee55d1f1d6... and https://github.com/matplotlib/matplotlib/pull/17636, https://github.com/matplotlib/matplotlib/pull/10613, https://github.com/matplotlib/matplotlib/pull/10133
Where the issues tend to show up is if you have enough dynamic range that the small end is less than difference between adjacent representable numbers at the high end e.g.
In [5]: 1e16 == (1e16 + 1) Out[5]: True
This issue would crop up if you had, e.g., uint64 images utilizing the full range. We don't support uint64 images, and uint32 is OK still on this front if you use `float64` for calculations.
In some cases the scaling / unscaling does not work out the way you wish it would. While it is possible that the issues we are having are related to what we are doing with the results, forcing to [0, 1] restricts you to ~15 orders of magnitude on the whole image which seems not ideal. While it may not be common, that Matplotlib got those bug reports says we do have users with such extreme dynamic range in the community!
15 orders of magnitude is enormous! Note that all our floating point operations internally currently happen with float64 anyway—and this is pretty much the best you can do here.
The other issue you mention is due to interpolation that sometimes goes outside the desired range; but this is an expected artifact of interpolation (which we typically have the `clip` flag for).
To be clear, I'm trying to get a the underlying issues here and identify them; not to dismiss your concerns!
Best regards, Stéfan _______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: jni@fastmail.com
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: silvertrumpet999@gmail.com
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: kmichael.aye@gmail.com
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: jni@fastmail.com
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: grlee77@gmail.com
Hello and sorry for the short answer from my phone, As you may already know, I prefer the skimage.v0 option tout skimage2. Concerning the preserve_range problème, what about making it keyword only in v1? No silent error in this case... Riadh Le lundi 26 juillet 2021, Gregory Lee <grlee77@gmail.com> a écrit :
On Mon, Jul 26, 2021 at 1:57 AM Juan Nunez-Iglesias <jni@fastmail.com>
Hey, on cue! Anyone care to answer this?
https://stackoverflow.com/questions/68487902/why-does-the-variance-of-laplac...
;) Thanks Michael for chiming in. User (even long-ago user) feedback is *the most valuable* in this situation, as we maintainers can become quite detached from “majority” “real-world” use cases. =) Stéfan, my point is that the rescaling is only one of several issues, all of which require similar acrobatics to achieve. We have limited developer time to do full deprecation cycles, so making all of them together *and in one go* (rather than a 2-4 version deprecation dance) is a
Regarding consensus on *how* to make a clean break, I think that you are correct that my maybe-too-clever-and-untested force-everyone-to-pin approach is dead in the water. But I also think that you understate the amount of consensus on the skimage2 approach: most people seem pretty much on board with it including early detractors like Alex, *and* it has successful models in the community (eg. bs4 and cv2). I also think that “raise an error on anything other than floats in 0-1” is an approach that will annoy many and benefit few. In other words, in my opinion: not rescaling but accepting all dtypes has usability benefits, in addition to hopefully reducing maintainer load, *but* raising errors on all inputs other than floats in [0, 1] will also presumably reduce maintainer load in the long term, but at the cost of (probably significant) user annoyance.
I am more in favor of a skimage2 (or similar) approach than the pinning approach in the SKIP, particularly as the discussion here has progressed.
Regarding automatically scaling to [0, 1], I am definitely not in favor of going back to that for floating point data! We changed `img_as_float` to
wrote: preferable approach. preserve range on float inputs quite a while ago at this point and it would be pretty annoying to users to switch it back again. I can see some argument for keeping the current scaling for the integer cases, but I still have a feeling it is likely better to not force rescaling there either. Having data unexpectedly rescaled was probably the most annoying aspect to me as a user in medical imaging applications. An additional point in favor of not rescaling is consistency with scipy.ndimage.
Josh, we do have such a documentation page: https://scikit-image.org/docs/dev/user_guide/data_types.html Unfortunately, it is not trivial to discover. Even with a big fat link
on the front page, I suspect most users won’t find it before asking for help, because navigating documentation is hard. FAQs/documentation links are very good and necessary when complexity is *unavoidable*, but when it is avoidable, they are a bandaid.
We have fortunately been able to get quite a few users to visit that page
(>20k visitors in the past 6 months according to our web metrics). It is the most-visited of the user guide pages, but not as visited as the installation page or some of the example and API pages.
Again, I’d prefer to point people to documentation explaining
My proposal going forward is to reject SKIP-3 and create a SKIP-4
Juan. PS: Tom, I know you expressed preference for skimage.v0/skimage.v1, but
On 26 Jul 2021, at 8:30 am, K.-Michael Aye <kmichael.aye@gmail.com>
wrote:
Hi all, as a scientific image user I have been reading along this difficult
Let me first pay my respect that these difficult and, by nature, opinionated (which is good!) discussions are being performed in such a civil manner! As someone who is member in a technical committee for Python software myself, I know how hard this can be.. Now to the issue at hand, I was wondering if this could be tackled as it's done in Space/Tech engineering, with a requirements documents that all should agree on, from which maybe the one and only obvious solution will emerge? I wanted to mention my personal requirements for working with an image
Please forgive me if all of this already happens in skimage, but it's been a while since I was using it: First, and for me the most important: Pixel values are sacred and shall never be changed without letting the user know. I'm almost sure that this is the case with skimage now, but in the early days I remember I was highly surprised, annoyed even, when some routine simply insisted that the input data needs to be so and so and the result will be this format, no matter what came in. It simply resulted in being less useful for me (no complaint, I know I could have done some PRs ;) ). I will admit that us instrumentalists are completely ignorant of certain standards in proper image formats, we simply use them as co-located data containers. This means they can be ANY format: * Integers (both signed and unsigned) - with counts as high as the digitized signal required for determining
* Floats, often representing physical values after the integer format version was calibrated, but with absolutely no sensible/reasonable way to force them into some kind of range. The pixel values represent physics values, they don't care that a float image shouldn't be larger than 1.0 but the fact is, ALL of these pixel values are measurements with a meaning and they absolutely need to be preserved. This statement needs to be qualified though with "within reason", as obviously some "wanted" operation like a median filter to remove noise will change pixel values, but is indeed range preserving and the meaning of the data isn't lost. I understand that certain algorithms require the incoming image to be in a certain format and range, and if no "standard" wrapper can be identified
I for myself am lucky that I do not have a lot of code that I would need
to change, so I wouldn't really mind any import name changes, so I think it might be much more of an discussion for the maintainers which way minimize a convolution of maintainer_effort with user_pain, but honesty, knowing how hard it is to find extra time for a passion volunteer-effort project, I'd almost always go for "least effort", because I think the community will come around (as can be seen with cv2 and other examples).
I just wanted to emphasize how important the pixel values can be for us, as they literally represent the bearer of the truth from outer space, so to speak, and any change of their values shall be done only under full consideration of the consequences. My 2 opinionated cents. As always, thanks so much for everybody's effort for this project, we soon will have a technical-committee-reviewed package of many of my
Best regards, Michael
On Sun, Jul 25, 2021 at 11:40 AM Josh Warner <silvertrumpet999@gmail.com> wrote:
I'll be brief as my internet is currently down, replying from mobile. Of these examples and similar, I would characterize them in a couple
categories
1. Data range user errors - the user used (almost always an overly large) type for their actual data and they end up with an image which looks all black/gray/etc. 2. Signed data of course needs to include the symmetric range [-1, 1] as a generalization of unsigned workflow, which happens naturally since float64 is signed. 3. Overshoots/undershoots due to expected computational effects, as mentioned elsewhere in this thread; user may or may not want to retain
These do represent a low level support burden - but since the story is
I would assert anyone working in this space NEEDS to understand their data and its representation or they will have serious problems. It is so foundational that insulating them from the concept doesn't do them favors. That said the workings and logic of dtype.py are somewhat opaque. Could a featured, direct, high-yield document informing users about our conversion behavior and a FAQ serve users just as well as the heroic efforts suggested? Josh On Sat, Jul 24, 2021, 19:59 Juan Nunez-Iglesias <jni@fastmail.com> wrote:
I'm very glad to hear from you, Josh 😊, but I'm 100% convinced that
removing the automatic rescaling is the right path forward. Stéfan, "floats between [0, 1]" is easy enough to explain, except when it isn't (signed filters), or when we automatically rescale int32s in [0, 255] to floats in [0, 2**(-31)], or uint16s in [0, 4095] to floats in [0, 2**(-4)], etc. I can't count the number of times I've had to point users to "Image data types and what they mean". Floats in [0, 1] is certainly not simpler to explain than "we use floats internally for computation, period." Yes, there is a chance that we'll now get users confused about uint8 overflow/underflow, but at least then we can teach them about fundamental computer science principles, rather than about how skimage does things "just so".
As Matthew pointed out, the user is best placed to know how to manage
Matthew, apologies for sounding callous to users — that is absolutely
not my intent! Hence this email thread. The question when aiming for a new API is how to move the community forward without fracturing it. My suggestion of "upgrade pressure" was aimed at doing this, with the implicit assumption that *limited* short term pain would result in higher long-term gain — for all our users.
I'm certainly starting to be persuaded that skimage2 is indeed the
best path forward, mainly so that we don't invalidate old Q&As and tutorials. We can perhaps do a combination, though:
skimage 0.19 is the last "real" release with the old API skimage2 2.0 is the next real release when skimage 2.0 is release, we release skimage 0.20, which is 0.19
with a warning that scikit-image is deprecated and no longer maintained, and point to the migration guide, and if you want to keep using the deprecated API, pin to 0.19 explicitly.
That probably satisfies my "migration pressure" requirement.
Juan.
On Fri, 23 Jul 2021, at 8:29 PM, Stefan van der Walt wrote:
Hi Tom,
On Fri, Jul 23, 2021, at 17:57, Thomas Caswell wrote:
See around
https://github.com/matplotlib/matplotlib/blob/88f53b12e1443a9ae046ee55d1f1d6... https://github.com/matplotlib/matplotlib/pull/17636, https://github.com/matplotlib/matplotlib/pull/10613, https://github.com/matplotlib/matplotlib/pull/10133
Where the issues tend to show up is if you have enough dynamic range
In [5]: 1e16 == (1e16 + 1) Out[5]: True
This issue would crop up if you had, e.g., uint64 images utilizing the
full range. We don't support uint64 images, and uint32 is OK still on this front if you use `float64` for calculations.
In some cases the scaling / unscaling does not work out the way you
wish it would. While it is possible that the issues we are having are related to what we are doing with the results, forcing to [0, 1] restricts you to ~15 orders of magnitude on the whole image which seems not ideal. While it may not be common, that Matplotlib got those bug reports says we do have users with such extreme dynamic range in the community!
15 orders of magnitude is enormous! Note that all our floating point
operations internally currently happen with float64 anyway—and this is
fundamentals rather than “this is just the skimage way.” proposing the skimage2 package. the main advantage you stated there (depending on both, migrating gradually) is also present with skimage2. thread. library. the dynamic range, sometime negative because some weird amplifier randomly would suck off electrons, who knows what the engineers are cooking with ... ;) that could transform and back-transform into the same range, then the user should be pointed to workarounds, but not left alone simply with the error message that the format doesn't match the algorithm. planetary science tools for data retrieval and data reading coming out, so I'm kinda feeling now how much damned work it is to design tools for the "community"... these, and are uncommon. predictable, presents the opportunity to guide users toward a FAQ or similar before filling a new Issue. That would certainly be less disruptive than the solutions proposed! their data scales. When we do it automagically, we often mess up. And Stéfan, to steal from your approach, we can look to our values to guide our decision-making: "we don't do magic." Let's remove the last few places where we do. that the small end is less than difference between adjacent representable numbers at the high end e.g. pretty much the best you can do here.
The other issue you mention is due to interpolation that sometimes
goes outside the desired range; but this is an expected artifact of interpolation (which we typically have the `clip` flag for).
To be clear, I'm trying to get a the underlying issues here and
identify them; not to dismiss your concerns!
Best regards, Stéfan _______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: jni@fastmail.com
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: silvertrumpet999@gmail.com
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: kmichael.aye@gmail.com
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: jni@fastmail.com
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: grlee77@gmail.com
Hello and sorry for the short answer from my phone, As you may already know, I prefer the skimage.v0 option tout skimage2. Concerning the preserve_range problème, what about making it keyword only in v1? No silent error in this case... Riadh
Le lundi 26 juillet 2021, Gregory Lee <grlee77@gmail.com> a écrit :
On Mon, Jul 26, 2021 at 1:57 AM Juan Nunez-Iglesias <jni@fastmail.com>
wrote:
Hey, on cue! Anyone care to answer this?
https://stackoverflow.com/questions/68487902/why-does-the-variance-of-laplac...
;) Thanks Michael for chiming in. User (even long-ago user) feedback is *the most valuable* in this situation, as we maintainers can become quite detached from “majority” “real-world” use cases. =) Stéfan, my point is that the rescaling is only one of several issues, all of which require similar acrobatics to achieve. We have limited developer time to do full deprecation cycles, so making all of them together *and in one go* (rather than a 2-4 version deprecation dance) is a
Regarding consensus on *how* to make a clean break, I think that you are correct that my maybe-too-clever-and-untested force-everyone-to-pin approach is dead in the water. But I also think that you understate the amount of consensus on the skimage2 approach: most people seem pretty much on board with it including early detractors like Alex, *and* it has successful models in the community (eg. bs4 and cv2). I also think that “raise an error on anything other than floats in 0-1” is an approach that will annoy many and benefit few. In other words, in my opinion: not rescaling but accepting all dtypes has usability benefits, in addition to hopefully reducing maintainer load, *but* raising errors on all inputs other than floats in [0, 1] will also presumably reduce maintainer load in the long term, but at the cost of (probably significant) user annoyance.
I am more in favor of a skimage2 (or similar) approach than the pinning approach in the SKIP, particularly as the discussion here has progressed.
Regarding automatically scaling to [0, 1], I am definitely not in favor of going back to that for floating point data! We changed `img_as_float` to
Josh, we do have such a documentation page: https://scikit-image.org/docs/dev/user_guide/data_types.html Unfortunately, it is not trivial to discover. Even with a big fat link
on the front page, I suspect most users won’t find it before asking for help, because navigating documentation is hard. FAQs/documentation links are very good and necessary when complexity is *unavoidable*, but when it is avoidable, they are a bandaid.
We have fortunately been able to get quite a few users to visit that
Again, I’d prefer to point people to documentation explaining
fundamentals rather than “this is just the skimage way.”
My proposal going forward is to reject SKIP-3 and create a SKIP-4
Juan. PS: Tom, I know you expressed preference for skimage.v0/skimage.v1, but
On 26 Jul 2021, at 8:30 am, K.-Michael Aye <kmichael.aye@gmail.com>
wrote:
Hi all, as a scientific image user I have been reading along this difficult
Let me first pay my respect that these difficult and, by nature, opinionated (which is good!) discussions are being performed in such a civil manner! As someone who is member in a technical committee for Python software myself, I know how hard this can be.. Now to the issue at hand, I was wondering if this could be tackled as it's done in Space/Tech engineering, with a requirements documents that all should agree on, from which maybe the one and only obvious solution will emerge? I wanted to mention my personal requirements for working with an image
Please forgive me if all of this already happens in skimage, but it's been a while since I was using it: First, and for me the most important: Pixel values are sacred and shall never be changed without letting the user know. I'm almost sure that this is the case with skimage now, but in the early days I remember I was highly surprised, annoyed even, when some routine simply insisted that the input data needs to be so and so and the result will be this format, no matter what came in. It simply resulted in being less useful for me (no complaint, I know I could have done some PRs ;) ). I will admit that us instrumentalists are completely ignorant of certain standards in proper image formats, we simply use them as co-located data containers. This means they can be ANY format: * Integers (both signed and unsigned) - with counts as high as the digitized signal required for determining
* Floats, often representing physical values after the integer format version was calibrated, but with absolutely no sensible/reasonable way to force them into some kind of range. The pixel values represent physics values, they don't care that a float image shouldn't be larger than 1.0 but the fact is, ALL of these pixel values are measurements with a meaning and they absolutely need to be preserved. This statement needs to be qualified though with "within reason", as obviously some "wanted" operation like a median filter to remove noise will change pixel values, but is indeed range preserving and the meaning of the data isn't lost. I understand that certain algorithms require the incoming image to be in a certain format and range, and if no "standard" wrapper can be identified that could transform and back-transform into the same range,
I for myself am lucky that I do not have a lot of code that I would
need to change, so I wouldn't really mind any import name changes, so I
I just wanted to emphasize how important the pixel values can be for us, as they literally represent the bearer of the truth from outer space, so to speak, and any change of their values shall be done only under full consideration of the consequences. My 2 opinionated cents. As always, thanks so much for everybody's effort for this project, we soon will have a technical-committee-reviewed package of many of my
Best regards, Michael
On Sun, Jul 25, 2021 at 11:40 AM Josh Warner <silvertrumpet999@gmail.com> wrote:
I'll be brief as my internet is currently down, replying from mobile. Of these examples and similar, I would characterize them in a couple
categories
1. Data range user errors - the user used (almost always an overly large) type for their actual data and they end up with an image which looks all black/gray/etc. 2. Signed data of course needs to include the symmetric range [-1, 1] as a generalization of unsigned workflow, which happens naturally since float64 is signed. 3. Overshoots/undershoots due to expected computational effects, as mentioned elsewhere in this thread; user may or may not want to retain
These do represent a low level support burden - but since the story is
I would assert anyone working in this space NEEDS to understand their data and its representation or they will have serious problems. It is so foundational that insulating them from the concept doesn't do them favors. That said the workings and logic of dtype.py are somewhat opaque. Could a featured, direct, high-yield document informing users about our conversion behavior and a FAQ serve users just as well as the heroic efforts suggested? Josh On Sat, Jul 24, 2021, 19:59 Juan Nunez-Iglesias <jni@fastmail.com> wrote:
I'm very glad to hear from you, Josh 😊, but I'm 100% convinced that
removing the automatic rescaling is the right path forward. Stéfan, "floats between [0, 1]" is easy enough to explain, except when it isn't (signed filters), or when we automatically rescale int32s in [0, 255] to floats in [0, 2**(-31)], or uint16s in [0, 4095] to floats in [0, 2**(-4)], etc. I can't count the number of times I've had to point users to "Image data types and what they mean". Floats in [0, 1] is certainly not simpler to explain than "we use floats internally for computation, period." Yes, there is a chance that we'll now get users confused about uint8 overflow/underflow, but at least then we can teach them about fundamental computer science principles, rather than about how skimage does things "just so".
As Matthew pointed out, the user is best placed to know how to manage
Matthew, apologies for sounding callous to users — that is absolutely
not my intent! Hence this email thread. The question when aiming for a new API is how to move the community forward without fracturing it. My suggestion of "upgrade pressure" was aimed at doing this, with the implicit assumption that *limited* short term pain would result in higher long-term gain — for all our users.
I'm certainly starting to be persuaded that skimage2 is indeed the
best path forward, mainly so that we don't invalidate old Q&As and tutorials. We can perhaps do a combination, though:
skimage 0.19 is the last "real" release with the old API skimage2 2.0 is the next real release when skimage 2.0 is release, we release skimage 0.20, which is 0.19
with a warning that scikit-image is deprecated and no longer maintained, and point to the migration guide, and if you want to keep using the deprecated API, pin to 0.19 explicitly.
That probably satisfies my "migration pressure" requirement.
Juan.
On Fri, 23 Jul 2021, at 8:29 PM, Stefan van der Walt wrote:
Hi Tom,
On Fri, Jul 23, 2021, at 17:57, Thomas Caswell wrote:
See around
https://github.com/matplotlib/matplotlib/blob/88f53b12e1443a9ae046ee55d1f1d6... https://github.com/matplotlib/matplotlib/pull/17636, https://github.com/matplotlib/matplotlib/pull/10613, https://github.com/matplotlib/matplotlib/pull/10133
Where the issues tend to show up is if you have enough dynamic range
In [5]: 1e16 == (1e16 + 1) Out[5]: True
This issue would crop up if you had, e.g., uint64 images utilizing
In some cases the scaling / unscaling does not work out the way you
wish it would. While it is possible that the issues we are having are related to what we are doing with the results, forcing to [0, 1] restricts you to ~15 orders of magnitude on the whole image which seems not ideal. While it may not be common, that Matplotlib got those bug reports says we do have users with such extreme dynamic range in the community!
15 orders of magnitude is enormous! Note that all our floating point
operations internally currently happen with float64 anyway—and this is
And sorry for the automatic french grammar corrections :( Le mercredi 28 juillet 2021, Riadh Fezzani <rfezzani@gmail.com> a écrit : preferable approach. preserve range on float inputs quite a while ago at this point and it would be pretty annoying to users to switch it back again. I can see some argument for keeping the current scaling for the integer cases, but I still have a feeling it is likely better to not force rescaling there either. Having data unexpectedly rescaled was probably the most annoying aspect to me as a user in medical imaging applications. An additional point in favor of not rescaling is consistency with scipy.ndimage. page (>20k visitors in the past 6 months according to our web metrics). It is the most-visited of the user guide pages, but not as visited as the installation page or some of the example and API pages. proposing the skimage2 package. the main advantage you stated there (depending on both, migrating gradually) is also present with skimage2. thread. library. the dynamic range, sometime negative because some weird amplifier randomly would suck off electrons, who knows what the engineers are cooking with ... ;) then the user should be pointed to workarounds, but not left alone simply with the error message that the format doesn't match the algorithm. think it might be much more of an discussion for the maintainers which way minimize a convolution of maintainer_effort with user_pain, but honesty, knowing how hard it is to find extra time for a passion volunteer-effort project, I'd almost always go for "least effort", because I think the community will come around (as can be seen with cv2 and other examples). planetary science tools for data retrieval and data reading coming out, so I'm kinda feeling now how much damned work it is to design tools for the "community"... these, and are uncommon. predictable, presents the opportunity to guide users toward a FAQ or similar before filling a new Issue. That would certainly be less disruptive than the solutions proposed! their data scales. When we do it automagically, we often mess up. And Stéfan, to steal from your approach, we can look to our values to guide our decision-making: "we don't do magic." Let's remove the last few places where we do. that the small end is less than difference between adjacent representable numbers at the high end e.g. the full range. We don't support uint64 images, and uint32 is OK still on this front if you use `float64` for calculations. pretty much the best you can do here.
The other issue you mention is due to interpolation that sometimes
goes outside the desired range; but this is an expected artifact of interpolation (which we typically have the `clip` flag for).
To be clear, I'm trying to get a the underlying issues here and
identify them; not to dismiss your concerns!
Best regards, Stéfan _______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: jni@fastmail.com
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: silvertrumpet999@gmail.com
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: kmichael.aye@gmail.com
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: jni@fastmail.com
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: grlee77@gmail.com
And sorry for the automatic french grammar corrections :(
Le mercredi 28 juillet 2021, Riadh Fezzani <rfezzani@gmail.com> a écrit :
Hello and sorry for the short answer from my phone, As you may already know, I prefer the skimage.v0 option tout skimage2. Concerning the preserve_range problème, what about making it keyword only in v1? No silent error in this case... Riadh
Le lundi 26 juillet 2021, Gregory Lee <grlee77@gmail.com> a écrit :
On Mon, Jul 26, 2021 at 1:57 AM Juan Nunez-Iglesias <jni@fastmail.com>
wrote:
Hey, on cue! Anyone care to answer this?
https://stackoverflow.com/questions/68487902/why-does-the-variance-of-laplac...
;) Thanks Michael for chiming in. User (even long-ago user) feedback is *the most valuable* in this situation, as we maintainers can become quite detached from “majority” “real-world” use cases. =) Stéfan, my point is that the rescaling is only one of several issues, all of which require similar acrobatics to achieve. We have limited developer time to do full deprecation cycles, so making all of them together *and in one go* (rather than a 2-4 version deprecation dance) is a
Regarding consensus on *how* to make a clean break, I think that you are correct that my maybe-too-clever-and-untested force-everyone-to-pin approach is dead in the water. But I also think that you understate the amount of consensus on the skimage2 approach: most people seem pretty much on board with it including early detractors like Alex, *and* it has successful models in the community (eg. bs4 and cv2). I also think that “raise an error on anything other than floats in 0-1” is an approach that will annoy many and benefit few. In other words, in my opinion: not rescaling but accepting all dtypes has usability benefits, in addition to hopefully reducing maintainer load, *but* raising errors on all inputs other than floats in [0, 1] will also presumably reduce maintainer load in the long term, but at the cost of (probably significant) user annoyance.
I am more in favor of a skimage2 (or similar) approach than the pinning approach in the SKIP, particularly as the discussion here has progressed.
Regarding automatically scaling to [0, 1], I am definitely not in favor of going back to that for floating point data! We changed `img_as_float` to
Josh, we do have such a documentation page: https://scikit-image.org/docs/dev/user_guide/data_types.html Unfortunately, it is not trivial to discover. Even with a big fat link
on the front page, I suspect most users won’t find it before asking for help, because navigating documentation is hard. FAQs/documentation links are very good and necessary when complexity is *unavoidable*, but when it is avoidable, they are a bandaid.
We have fortunately been able to get quite a few users to visit that
Again, I’d prefer to point people to documentation explaining
fundamentals rather than “this is just the skimage way.”
My proposal going forward is to reject SKIP-3 and create a SKIP-4
Juan. PS: Tom, I know you expressed preference for skimage.v0/skimage.v1, but the main advantage you stated there (depending on both, migrating gradually) is also present with skimage2.
On 26 Jul 2021, at 8:30 am, K.-Michael Aye <kmichael.aye@gmail.com> wrote: Hi all, as a scientific image user I have been reading along this difficult
Let me first pay my respect that these difficult and, by nature, opinionated (which is good!) discussions are being performed in such a civil manner! As someone who is member in a technical committee for Python software myself, I know how hard this can be.. Now to the issue at hand, I was wondering if this could be tackled as it's done in Space/Tech engineering, with a requirements documents that all should agree on, from which maybe the one and only obvious solution will emerge? I wanted to mention my personal requirements for working with an image
Please forgive me if all of this already happens in skimage, but it's been a while since I was using it: First, and for me the most important: Pixel values are sacred and shall never be changed without letting the user know. I'm almost sure that this is the case with skimage now, but in the early days I remember I was highly surprised, annoyed even, when some routine simply insisted that the input data needs to be so and so and the result will be this format, no matter what came in. It simply resulted in being less useful for me (no complaint, I know I could have done some PRs ;) ). I will admit that us instrumentalists are completely ignorant of certain standards in proper image formats, we simply use them as co-located data containers. This means they can be ANY format: * Integers (both signed and unsigned) - with counts as high as the digitized signal required for determining
* Floats, often representing physical values after the integer format version was calibrated, but with absolutely no sensible/reasonable way to force them into some kind of range. The pixel values represent physics values, they don't care that a float image shouldn't be larger than 1.0 but the fact is, ALL of these pixel values are measurements with a meaning and they absolutely need to be preserved. This statement needs to be qualified though with "within reason", as obviously some "wanted" operation like a median filter to remove noise will change pixel values, but is indeed range preserving and the meaning of the data isn't lost. I understand that certain algorithms require the incoming image to be in a certain format and range, and if no "standard" wrapper can be identified that could transform and back-transform into the same range,
I for myself am lucky that I do not have a lot of code that I would
need to change, so I wouldn't really mind any import name changes, so I
I just wanted to emphasize how important the pixel values can be for us, as they literally represent the bearer of the truth from outer space, so to speak, and any change of their values shall be done only under full consideration of the consequences. My 2 opinionated cents. As always, thanks so much for everybody's effort for this project, we soon will have a technical-committee-reviewed package of many of my
Best regards, Michael
On Sun, Jul 25, 2021 at 11:40 AM Josh Warner < silvertrumpet999@gmail.com> wrote:
I'll be brief as my internet is currently down, replying from mobile. Of these examples and similar, I would characterize them in a couple
categories
1. Data range user errors - the user used (almost always an overly large) type for their actual data and they end up with an image which looks all black/gray/etc. 2. Signed data of course needs to include the symmetric range [-1, 1] as a generalization of unsigned workflow, which happens naturally since float64 is signed. 3. Overshoots/undershoots due to expected computational effects, as mentioned elsewhere in this thread; user may or may not want to retain
These do represent a low level support burden - but since the story is predictable, presents the opportunity to guide users toward a FAQ or similar before filling a new Issue. That would certainly be less disruptive than the solutions proposed! I would assert anyone working in this space NEEDS to understand their data and its representation or they will have serious problems. It is so foundational that insulating them from the concept doesn't do them favors. That said the workings and logic of dtype.py are somewhat opaque. Could a featured, direct, high-yield document informing users about our conversion behavior and a FAQ serve users just as well as the heroic efforts suggested? Josh On Sat, Jul 24, 2021, 19:59 Juan Nunez-Iglesias <jni@fastmail.com> wrote:
I'm very glad to hear from you, Josh 😊, but I'm 100% convinced that
removing the automatic rescaling is the right path forward. Stéfan, "floats between [0, 1]" is easy enough to explain, except when it isn't (signed filters), or when we automatically rescale int32s in [0, 255] to floats in [0, 2**(-31)], or uint16s in [0, 4095] to floats in [0, 2**(-4)], etc. I can't count the number of times I've had to point users to "Image data types and what they mean". Floats in [0, 1] is certainly not simpler to explain than "we use floats internally for computation, period." Yes, there is a chance that we'll now get users confused about uint8 overflow/underflow, but at least then we can teach them about fundamental computer science principles, rather than about how skimage does things "just so".
As Matthew pointed out, the user is best placed to know how to manage their data scales. When we do it automagically, we often mess up. And Stéfan, to steal from your approach, we can look to our values to guide our decision-making: "we don't do magic." Let's remove the last few places where we do.
Matthew, apologies for sounding callous to users — that is absolutely not my intent! Hence this email thread. The question when aiming for a new API is how to move the community forward without fracturing it. My suggestion of "upgrade pressure" was aimed at doing this, with the implicit assumption that *limited* short term pain would result in higher long-term gain — for all our users.
I'm certainly starting to be persuaded that skimage2 is indeed the best path forward, mainly so that we don't invalidate old Q&As and tutorials. We can perhaps do a combination, though:
skimage 0.19 is the last "real" release with the old API skimage2 2.0 is the next real release when skimage 2.0 is release, we release skimage 0.20, which is 0.19 with a warning that scikit-image is deprecated and no longer maintained, and point to the migration guide, and if you want to keep using the deprecated API, pin to 0.19 explicitly.
That probably satisfies my "migration pressure" requirement.
Juan.
On Fri, 23 Jul 2021, at 8:29 PM, Stefan van der Walt wrote:
Hi Tom,
On Fri, Jul 23, 2021, at 17:57, Thomas Caswell wrote:
See around https://github.com/matplotlib/matplotlib/blob/88f53b12e1443a9ae046ee55d1f1d6... https://github.com/matplotlib/matplotlib/pull/17636, https://github.com/matplotlib/matplotlib/pull/10613, https://github.com/matplotlib/matplotlib/pull/10133
Where the issues tend to show up is if you have enough dynamic range
In [5]: 1e16 == (1e16 + 1) Out[5]: True
This issue would crop up if you had, e.g., uint64 images utilizing
In some cases the scaling / unscaling does not work out the way you
wish it would. While it is possible that the issues we are having are related to what we are doing with the results, forcing to [0, 1] restricts you to ~15 orders of magnitude on the whole image which seems not ideal. While it may not be common, that Matplotlib got those bug reports says we do have users with such extreme dynamic range in the community!
15 orders of magnitude is enormous! Note that all our floating
And I meant making preserve_range required argument x) Le mercredi 28 juillet 2021, Riadh Fezzani <rfezzani@gmail.com> a écrit : preferable approach. preserve range on float inputs quite a while ago at this point and it would be pretty annoying to users to switch it back again. I can see some argument for keeping the current scaling for the integer cases, but I still have a feeling it is likely better to not force rescaling there either. Having data unexpectedly rescaled was probably the most annoying aspect to me as a user in medical imaging applications. An additional point in favor of not rescaling is consistency with scipy.ndimage. page (>20k visitors in the past 6 months according to our web metrics). It is the most-visited of the user guide pages, but not as visited as the installation page or some of the example and API pages. proposing the skimage2 package. thread. library. the dynamic range, sometime negative because some weird amplifier randomly would suck off electrons, who knows what the engineers are cooking with ... ;) then the user should be pointed to workarounds, but not left alone simply with the error message that the format doesn't match the algorithm. think it might be much more of an discussion for the maintainers which way minimize a convolution of maintainer_effort with user_pain, but honesty, knowing how hard it is to find extra time for a passion volunteer-effort project, I'd almost always go for "least effort", because I think the community will come around (as can be seen with cv2 and other examples). planetary science tools for data retrieval and data reading coming out, so I'm kinda feeling now how much damned work it is to design tools for the "community"... these, and are uncommon. that the small end is less than difference between adjacent representable numbers at the high end e.g. the full range. We don't support uint64 images, and uint32 is OK still on this front if you use `float64` for calculations. point operations internally currently happen with float64 anyway—and this is pretty much the best you can do here.
The other issue you mention is due to interpolation that sometimes
goes outside the desired range; but this is an expected artifact of interpolation (which we typically have the `clip` flag for).
To be clear, I'm trying to get a the underlying issues here and
identify them; not to dismiss your concerns!
Best regards, Stéfan _______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: jni@fastmail.com
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: silvertrumpet999@gmail.com
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: kmichael.aye@gmail.com
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: jni@fastmail.com
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: grlee77@gmail.com
On Tue, Jul 20, 2021 at 2:11 PM Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Tue, Jul 20, 2021, at 05:32, Gregory Lee wrote:
This prevents the Hinsen-style breakage because "import skimage" would immediately fail in 1.0, but old codes and existing libraries could be easily adapted to use skimage2.skimage until they have migrated to skimage2. It is also fine to remove "skimage2.skimage" in the next major release without causing silent breakage. Aesthetically, "import skimage2" is a little worse than "import skimage", but not something I am too concerned about.
To be fair to Juan, I think this was one of his initial suggestions, but some of us balked at the thought of renaming the library or maintaining two versions. Hence the suggested technical footwork.
But, reading the arguments here, I am convinced that the only way to avoid Hinsen-type changes AND give programmatic errors for all future versions is to change the import name.
Indeed, there is already a brief discussion of this alternative under "New package naming" within the *Alternatives* section of SKIP3. It was not discussed in the context of this Hinsen-style breakage there, though, so I thought it was worth noting that as one point in its favor. One point listed against it in the SKIP is potential to increase fragmentation by reducing pressure on downstream libraries to update to the new API.
Then, there is the question of whether to support the existing API inside of `skimage2`. My gut feel is to make different packages (`pip install scikit-image` becomes `pip install skimage2`), and to let people hang on to `import skimage` until they are ready to `import skimage2`. We can also backport bugfixes for a while.
This is getting into the weeds, but if we go this route we should probably match the version numbers --- `skimage 2.0` imports as `skimage2` and simply skip 1.0.
Stéfan
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org https://mail.python.org/mailman3/lists/scikit-image.python.org/ Member address: grlee77@gmail.com
participants (12)
-
Alexandre de Siqueira
-
Gregory Lee
-
Josh Warner
-
Juan Nunez-Iglesias
-
K.-Michael Aye
-
Matt Newville
-
Matthew Brett
-
Nelle Varoquaux
-
Riadh
-
Riadh Fezzani
-
Stefan van der Walt
-
Thomas Caswell