Dear scikit-image community, Last week, Stéfan van der Walt, Emmanuelle Gouillart, Alexandre da Siqueira, and I met at UC Berkeley to discuss the future of the library. For part of the meeting, Kira Evans from napari and Tom Caswell and Hannah Aizenmann from Matplotlib were also attending, and provided valuable feedback. Our unstructured notes are available here: https://github.com/scikit-image/meeting-notes/blob/master/2020/2020-02-27--B... The short of it is that we agree that it is a good time to aim to make a 1.0 release. As a team, we have developed an increasingly good idea of the pain points of the library, vs the things that work. Additionally, for the next year, thanks to CZI, we have ~1.5 people paid to work on scikit-image, so the hard work of major API restructuring is achievable. Our proposal is that **scikit-image 1.0 will contain breaking changes.** This is because, in many cases in our API, we want to change the return value of a function without changing its signature. This is difficult to achieve with deprecations — it typically involves creating a new, mostly useless keyword argument to signal the new behavior, that later needs to be removed. Additionally, allowing breaking changes will enable us to "clean up shop" and clean up the API a *lot*. So, how will we minimize the impact of this proposal to the community? * this email: we want your feedback on this proposal! Ultimately, this will become a SKIP that will need to be approved by the core developers. https://scikit-image.org/docs/dev/skips/0-skip-process.html In the meantime, nothing is set in stone. * create a transition guide: for every change, analyze how user code could be affected (errors, strange results), and produce an item in the guide for how to transition 0.x code to 1.x code. * extreme communication: we plan to announce our intentions far and wide, including this mailing list, others in the ecosystem, Twitter, and at conferences. For users that just want their code to keep working, pinning to "<1.0" will prevent their code's behavior from changing without warning. * pre-releases: we will make one or more pre-release of 1.0 so that users can test their code before the release comes out officially. * feature parity: we will time 1.0 to be simultaneous with a 0.x release, so that users can keep using the 0.x series without missing out on mission-critical features. In the coming days, I will be creating a few issues related to this proposal on GitHub at https://github.com/scikit-image/scikit-image, together with a corresponding GitHub Project. You can find all issues related to this transition by using the "1.0" tag. Those will be the right place to discuss specific API changes that we are considering. This thread or the meta-issue on GitHub will be the place to discuss the process of the 1.0 release overall. Questions? Concerns? Write us an email, catch us on Zulip, or comment on related issues on GitHub. Thanks for reading! Juan.
* feature parity: we will time 1.0 to be simultaneous with a 0.x release, so that users can keep using the 0.x series without missing out on mission-critical features.
From our experience going from Matplotlib 1.x -> 2.0 this is likely going to be the hardest part of what you are proposing. I suspect that the two branches will drift apart (making backports more difficult) and that the 1.0 release will take far longer than you expect leaving you in this state of commiting to keep feature parity between two diverging branches far longer than you thought you would ;) Tom On Mon, Mar 2, 2020 at 3:28 AM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
Dear scikit-image community,
Last week, Stéfan van der Walt, Emmanuelle Gouillart, Alexandre da Siqueira, and I met at UC Berkeley to discuss the future of the library. For part of the meeting, Kira Evans from napari and Tom Caswell and Hannah Aizenmann from Matplotlib were also attending, and provided valuable feedback.
Our unstructured notes are available here:
https://github.com/scikit-image/meeting-notes/blob/master/2020/2020-02-27--B...
The short of it is that we agree that it is a good time to aim to make a 1.0 release. As a team, we have developed an increasingly good idea of the pain points of the library, vs the things that work. Additionally, for the next year, thanks to CZI, we have ~1.5 people paid to work on scikit-image, so the hard work of major API restructuring is achievable.
Our proposal is that **scikit-image 1.0 will contain breaking changes.** This is because, in many cases in our API, we want to change the return value of a function without changing its signature. This is difficult to achieve with deprecations — it typically involves creating a new, mostly useless keyword argument to signal the new behavior, that later needs to be removed. Additionally, allowing breaking changes will enable us to "clean up shop" and clean up the API a *lot*.
So, how will we minimize the impact of this proposal to the community? * this email: we want your feedback on this proposal! Ultimately, this will become a SKIP that will need to be approved by the core developers. https://scikit-image.org/docs/dev/skips/0-skip-process.html In the meantime, nothing is set in stone. * create a transition guide: for every change, analyze how user code could be affected (errors, strange results), and produce an item in the guide for how to transition 0.x code to 1.x code. * extreme communication: we plan to announce our intentions far and wide, including this mailing list, others in the ecosystem, Twitter, and at conferences. For users that just want their code to keep working, pinning to "<1.0" will prevent their code's behavior from changing without warning. * pre-releases: we will make one or more pre-release of 1.0 so that users can test their code before the release comes out officially. * feature parity: we will time 1.0 to be simultaneous with a 0.x release, so that users can keep using the 0.x series without missing out on mission-critical features.
In the coming days, I will be creating a few issues related to this proposal on GitHub at https://github.com/scikit-image/scikit-image, together with a corresponding GitHub Project. You can find all issues related to this transition by using the "1.0" tag. Those will be the right place to discuss specific API changes that we are considering. This thread or the meta-issue on GitHub will be the place to discuss the process of the 1.0 release overall.
Questions? Concerns? Write us an email, catch us on Zulip, or comment on related issues on GitHub.
Thanks for reading!
Juan. _______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org
-- Thomas Caswell tcaswell@gmail.com
On Mon, Mar 2, 2020 at 9:30 AM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
Dear scikit-image community,
Last week, Stéfan van der Walt, Emmanuelle Gouillart, Alexandre da Siqueira, and I met at UC Berkeley to discuss the future of the library. For part of the meeting, Kira Evans from napari and Tom Caswell and Hannah Aizenmann from Matplotlib were also attending, and provided valuable feedback.
Our unstructured notes are available here:
https://github.com/scikit-image/meeting-notes/blob/master/2020/2020-02-27--B...
The short of it is that we agree that it is a good time to aim to make a 1.0 release. As a team, we have developed an increasingly good idea of the pain points of the library, vs the things that work. Additionally, for the next year, thanks to CZI, we have ~1.5 people paid to work on scikit-image, so the hard work of major API restructuring is achievable.
Our proposal is that **scikit-image 1.0 will contain breaking changes.** This is because, in many cases in our API, we want to change the return value of a function without changing its signature.
Just in case you're not aware of it, there's a nice Bunch design pattern that we've been discussing for a long time for scipy.stats, for the same reason of adding more return values. It allows you to do this in a backwards-compatible way, see https://github.com/scipy/scipy/issues/3665#issuecomment-451038177. The main idea is to freeze the number of arguments returned by tuple unpacking, and have namedtuple-like behavior otherwise (which is preferred for new code). Besides not breaking user code, the advantages would be to be able to make changes incrementally, also post-1.0. Cheers, Ralf This is difficult to achieve with deprecations — it typically involves
creating a new, mostly useless keyword argument to signal the new behavior, that later needs to be removed. Additionally, allowing breaking changes will enable us to "clean up shop" and clean up the API a *lot*.
So, how will we minimize the impact of this proposal to the community? * this email: we want your feedback on this proposal! Ultimately, this will become a SKIP that will need to be approved by the core developers. https://scikit-image.org/docs/dev/skips/0-skip-process.html In the meantime, nothing is set in stone. * create a transition guide: for every change, analyze how user code could be affected (errors, strange results), and produce an item in the guide for how to transition 0.x code to 1.x code. * extreme communication: we plan to announce our intentions far and wide, including this mailing list, others in the ecosystem, Twitter, and at conferences. For users that just want their code to keep working, pinning to "<1.0" will prevent their code's behavior from changing without warning. * pre-releases: we will make one or more pre-release of 1.0 so that users can test their code before the release comes out officially. * feature parity: we will time 1.0 to be simultaneous with a 0.x release, so that users can keep using the 0.x series without missing out on mission-critical features.
In the coming days, I will be creating a few issues related to this proposal on GitHub at https://github.com/scikit-image/scikit-image, together with a corresponding GitHub Project. You can find all issues related to this transition by using the "1.0" tag. Those will be the right place to discuss specific API changes that we are considering. This thread or the meta-issue on GitHub will be the place to discuss the process of the 1.0 release overall.
Questions? Concerns? Write us an email, catch us on Zulip, or comment on related issues on GitHub.
Thanks for reading!
Juan. _______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org
Thanks for the input, Ralf!
On 5 Mar 2020, at 9:04 am, Ralf Gommers <ralf.gommers@gmail.com> wrote: Just in case you're not aware of it, there's a nice Bunch design pattern that we've been discussing for a long time for scipy.stats, for the same reason of adding more return values. It allows you to do this in a backwards-compatible way, see https://github.com/scipy/scipy/issues/3665#issuecomment-451038177 <https://github.com/scipy/scipy/issues/3665#issuecomment-451038177>. The main idea is to freeze the number of arguments returned by tuple unpacking, and have namedtuple-like behavior otherwise (which is preferred for new code).
Besides not breaking user code, the advantages would be to be able to make changes incrementally, also post-1.0.
I might not be getting the full picture here: returning Bunches instead of (in most places) NumPy arrays would in itself be a breaking change? Similarly, one of the big changes we are proposing is returning an array of floats that is *not* rescaled to [0, 1]. That is, we want to still return a NumPy array (whether that is the plain array or an attribute in the Bunch) but the values in the array will be different. I don’t clearly see how Bunch solves that problem? Finally, it seems to me that dataclasses, being now (3.7) in the standard library and offering a few more features, might be a preferred option for this use case? Thank you! Juan.
On Wed, Mar 4, 2020, at 15:01, Juan Nunez-Iglesias wrote:
I might not be getting the full picture here: returning Bunches instead of (in most places) NumPy arrays would in itself be a breaking change? Similarly, one of the big changes we are proposing is returning an array of floats that is *not* rescaled to [0, 1]. That is, we want to still return a NumPy array (whether that is the plain array or an attribute in the Bunch) but the values in the array will be different. I don’t clearly see how Bunch solves that problem?
There are at least two considerations here: - We want to stop coercing image data to a certain range based on the data type of the input. This will be a breaking change, overall, unless we introduce the `preserve_range` keyword widely. - We would like to make, consistently, most functions be of the form: output_image = function(input_image) This makes for easy construction of pipelines: output_image = first_func(second_func(third_func(intput_image)) In cases where additional calculations are made, we want signatures of the form: output_image, additional_namedtuple = function.extra(input_image) [Exactly how the calculation of additional_namedtuple is triggered is not set in stone; the `.extra` attribute on functions was one suggestion of how to do that easily.] The usage of named tuples / bunches / data objects will be an integral part of the design. Best regards, Stéfan
I want to strongly suggest against having two branches. That said, the LTS release really took a lot of maintenance power, and involved a similar "feature/bugfix" parity. Do i understand that "strict deprecation" is something like what we have been doing already? On Wed, Mar 4, 2020 at 8:00 PM Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Wed, Mar 4, 2020, at 15:01, Juan Nunez-Iglesias wrote:
I might not be getting the full picture here: returning Bunches instead of (in most places) NumPy arrays would in itself be a breaking change? Similarly, one of the big changes we are proposing is returning an array of floats that is *not* rescaled to [0, 1]. That is, we want to still return a NumPy array (whether that is the plain array or an attribute in the Bunch) but the values in the array will be different. I don’t clearly see how Bunch solves that problem?
There are at least two considerations here:
- We want to stop coercing image data to a certain range based on the data type of the input. This will be a breaking change, overall, unless we introduce the `preserve_range` keyword widely.
- We would like to make, consistently, most functions be of the form:
output_image = function(input_image)
This makes for easy construction of pipelines:
output_image = first_func(second_func(third_func(intput_image))
In cases where additional calculations are made, we want signatures of the form:
output_image, additional_namedtuple = function.extra(input_image)
[Exactly how the calculation of additional_namedtuple is triggered is not set in stone; the `.extra` attribute on functions was one suggestion of how to do that easily.]
The usage of named tuples / bunches / data objects will be an integral part of the design.
Best regards, Stéfan _______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org
On Thu, Mar 5, 2020 at 1:59 AM Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Wed, Mar 4, 2020, at 15:01, Juan Nunez-Iglesias wrote:
I might not be getting the full picture here: returning Bunches instead of (in most places) NumPy arrays would in itself be a breaking change? Similarly, one of the big changes we are proposing is returning an array of floats that is *not* rescaled to [0, 1]. That is, we want to still return a NumPy array (whether that is the plain array or an attribute in the Bunch) but the values in the array will be different. I don’t clearly see how Bunch solves that problem?
It doesn't indeed, nothing can solve that for you.
There are at least two considerations here:
- We want to stop coercing image data to a certain range based on the data type of the input. This will be a breaking change, overall, unless we introduce the `preserve_range` keyword widely.
I did not get that from "we want to change the return value of a function". I assumed it was adding new return values. It seems like a *really* unhealthy idea to me to silently change numerical values. Despite the extensive communication, the vast majority of your users will not be aware of what's happening and run the risk of silently getting invalid results. I can't think of any important package in the SciPy/PyData ecosystem that has ever done what you're proposing after becoming popular. I would recommend to change the package name or name of the main namespace, to make sure people see an exception and become aware of the problem when they upgrade.
- We would like to make, consistently, most functions be of the form:
output_image = function(input_image)
This makes for easy construction of pipelines:
output_image = first_func(second_func(third_func(intput_image))
In cases where additional calculations are made, we want signatures of the form:
output_image, additional_namedtuple = function.extra(input_image)
[Exactly how the calculation of additional_namedtuple is triggered is not set in stone; the `.extra` attribute on functions was one suggestion of how to do that easily.]
The usage of named tuples / bunches / data objects will be an integral part of the design.
Thanks. Those changes all sound really useful. Cheers, Ralf
Best regards, Stéfan _______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org
Hopefully I can expand a bit on the image range scaling point, with some historical perspective on image processing data handling and how we got to where we are today. Long before scikit-image existed, image processing was often taught via Matlab with the appropriate toolbox. Curricula have generally opened with discussions about datatype; this was central to interacting with the underlying image data. A very old function provided in the Mathworks toolbox does this: *im2double* which takes an integer array and hands you back a double array. However, *it is not a simple dtype conversion; this function rescales the input image from the original range to the range [0, 1]*. I believe the original reasoning for this was that certain image operations were simpler and more intuitive to consider/teach with unit range, e.g., exposure operations and inverting the image. Scikit-Image has taken the position that we accept "[nearly] anything" as an input dtype, but we do not guarantee the output will match the input. If a modified image is returned (exposure, transformed, denoised, etc.), generally integer images are accepted but floating point images are returned to avoid inherent precision loss in pipelines. We check the input datatype to see if it needs to be changed for safety, and if necessary we do so. Datatype conversion currently takes an input integer image and rescales it to [0, 1] - Matlab style - by default. In many cases the rescaling step is *not optional*. Rescaling to [0, 1] is expected by Matlab veterans but perennially confuses new users. More concerning, some image data has physical meaning (e.g., CT Hounsfield units) and for obvious reasons, such users want to opt out of this behavior. We've made it possible to turn off rescaling, and some functions now expose a `preserve_range=` kwarg, but the current default behavior remains to silently rescale for backwards compatibility. We propose to globally remove the Matlab-style forced rescaling. Our functions would no longer assume a unit range, and would instead respect the input data range even if conversion to float is required for safety. Many if not most users may not even notice this change. In the worst case it is a linear multiplicative scaling. For those who do, it is easy to retain the prior behavior by normalizing their images in preprocessing or using `img_as_float()` with an optional kwarg to enable legacy unit normalization. Put differently, the current state is like if Scikit-Learn automatically whitened all input data *and you couldn't turn this off even if you wanted to*. Instead, Scikit-Learn strongly recommends whitening but ultimately the user is responsible for their data. We want Scikit-Image to move to a similar model, where we do not impose Matlab-style rescaling on our users' data. Josh On Thu, Mar 5, 2020 at 5:51 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Thu, Mar 5, 2020 at 1:59 AM Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Wed, Mar 4, 2020, at 15:01, Juan Nunez-Iglesias wrote:
I might not be getting the full picture here: returning Bunches instead of (in most places) NumPy arrays would in itself be a breaking change? Similarly, one of the big changes we are proposing is returning an array of floats that is *not* rescaled to [0, 1]. That is, we want to still return a NumPy array (whether that is the plain array or an attribute in the Bunch) but the values in the array will be different. I don’t clearly see how Bunch solves that problem?
It doesn't indeed, nothing can solve that for you.
There are at least two considerations here:
- We want to stop coercing image data to a certain range based on the data type of the input. This will be a breaking change, overall, unless we introduce the `preserve_range` keyword widely.
I did not get that from "we want to change the return value of a function". I assumed it was adding new return values.
It seems like a *really* unhealthy idea to me to silently change numerical values. Despite the extensive communication, the vast majority of your users will not be aware of what's happening and run the risk of silently getting invalid results. I can't think of any important package in the SciPy/PyData ecosystem that has ever done what you're proposing after becoming popular. I would recommend to change the package name or name of the main namespace, to make sure people see an exception and become aware of the problem when they upgrade.
- We would like to make, consistently, most functions be of the form:
output_image = function(input_image)
This makes for easy construction of pipelines:
output_image = first_func(second_func(third_func(intput_image))
In cases where additional calculations are made, we want signatures of the form:
output_image, additional_namedtuple = function.extra(input_image)
[Exactly how the calculation of additional_namedtuple is triggered is not set in stone; the `.extra` attribute on functions was one suggestion of how to do that easily.]
The usage of named tuples / bunches / data objects will be an integral part of the design.
Thanks. Those changes all sound really useful.
Cheers, Ralf
Best regards, Stéfan _______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org
_______________________________________________ scikit-image mailing list -- scikit-image@python.org To unsubscribe send an email to scikit-image-leave@python.org
Thanks everyone! Two clarifications:
It seems like a *really* unhealthy idea to me to silently change numerical values. Despite the extensive communication, the vast majority of your users will not be aware of what's happening and run the risk of silently getting invalid results. I can't think of any important package in the SciPy/PyData ecosystem that has ever done what you're proposing after becoming popular. I would recommend to change the package name or name of the main namespace, to make sure people see an exception and become aware of the problem when they upgrade.
I want to point out that there are many changes in the API itself, so it is extremely unlikely that errors will be silent. The API will change, stuff will break, and users will need to check how to fix it, and they will (presumably) see that the value itself will have changed. Having said this, I am not above changing our import to skimage2 or similar, so that users have to opt-in to the transition. But I do resent having to import e.g. bs4, so I’d like to avoid this if it all possible. One option is to change all of our *second* level imports, e.g. segmentation -> segment. Second, Mark, 100% re two branches, that is *not* the proposal, the proposal is to 1. Make a final 0.x release 2. Have a feature freeze in order to release 1.0 with the exact same feature set as 0.x. Juan.
On Thu, Mar 5, 2020, at 04:49, Ralf Gommers wrote:
It seems like a *really* unhealthy idea to me to silently change numerical values. Despite the extensive communication, the vast majority of your users will not be aware of what's happening and run the risk of silently getting invalid results. I can't think of any important package in the SciPy/PyData ecosystem that has ever done what you're proposing after becoming popular. I would recommend to change the package name or name of the main namespace, to make sure people see an exception and become aware of the problem when they upgrade.
Yes, this may indeed be pushing things too far. And, since we have the `preserve_range` keyword argument that could handle the issue, albeit with a long-ish deprecation, we should probably just do that. Stéfan
Yes, this may indeed be pushing things too far. And, since we have the `preserve_range` keyword argument that could handle the issue, albeit with a long-ish deprecation, we should probably just do that.
The problem is that we don’t want to have `preserve_range` at all in 1.0. So it is annoying to instruct everyone to add `preserve_range=True` to all their calls to avoid warnings, only to remove that keyword in the future.
participants (6)
-
Josh Warner
-
Juan Nunez-Iglesias
-
Mark Harfouche
-
Ralf Gommers
-
Stefan van der Walt
-
Thomas Caswell