Everyone agrees breaking the Hinsen rule (“avoid silent behavioural changes”) is bad. Over 4 versions, scikit-image has actually broken it a few times already. We certainly don’t want to break it en-masse with 1.0.

Riadh thinks, correct me if I’m wrong Riadh, that breaking it for 1.0 is ok, *especially* given the 0.20 warning. To be honest, one thing I like about the 0.20 warning is that it will *teach* people to pay attention to version numbers. The other plans, not so much. And these are important not just in this skimage transition but throughout the ecosystem. The Hinsen rule is broken dozens of times across the ecosystem. Even NumPy allows this over “long” deprecation periods. (See the copy=’never’ discussion.) But, back to summaries.

A different issue is API breakage. This is not as bad as the Hinsen rule but it can also be bad for user goodwill.

An idea that seems to be gaining momentum is to change the import name, but it’s unclear whether people favour keeping the old import name around or moving it to v0, and it’s also unclear whether people favour moving the PyPI *package* name, with scikit-image frozen forever in 0.19. So here’s some named options:

————————

old API import: unavailable at 1.0

new API import: skimage

old API package: scikit-image <1.*

new API package: scikit-image 1.*

* uses semver correctly

* with enough warning, lets users pin their dependencies intentionally, improving the reproducibility of their packages

* users who don’t run their code in the transition period won’t be warned

* if we don’t break the API enough, risks breaking Hinsen rule

* if we break it completely, risks annoying users

- the frozen package:

old API import: skimage

new API import: skimage.v1

old API package = new API package = scikit-image 1.*

* lets anyone migrate to new API at their own pace

* existing code and (more important, imho) existing StackOverflow answers etc continue to work

* no pressure for anyone to move to new API

* could take years to get people to migrate, splintering the community

* ability to mix code between APIs could give *very* confusing results

- the versioned package:

old API import: skimage.v0

new API import: skimage.v1

(skimage by itself errors)

old API package = new API package = scikit-image 1.*

* Forces people to be intentional about their API choice *or* simply pin

* no risk of breaking Hinsen rule

* minimal pressure to move to v1

* could take years for people to migrate, splintering the community

* ability to mix code between APIs could give *very* confusing results

- the new name, new import package:

old API import: skimage

new API import: skimage2

old API package: scikit-image (any version)

new API package: skimage2 (any version)

* Clear distinction between APIs both on the dependency level and import level

* Clear when reading someone’s code what version they are using

* no risk of breaking the Hinsen rule

* FINALLY, our package name matches our import name 🎉

* marginally more annoying import

* confusing for package managers — see e.g. ‘conda install pyqt’ vs ‘pip install pyqt5’

* potentially slow migration as users might not quickly become aware of skimage2

* what do we do for subsequent versions? e.g. opencv-python is at version 4.5 but imports as cv2 🤦‍♂️

- new name, same import:

old API import: skimage

new API import: skimage

old API package: scikit-image (any version)

new API package: skimage[2] (any version)

(noting here that we have skimage available and unused right now, though this might just be confusing given scikit-learn.)

* Get to keep skimage import

* FINALLY, our package name matches our import name 🎉

* No pressure for people to migrate

* No pressure for people to pin their package dependencies

* Unclear when reading code whether they are using skimage<1. Together with previous two, this to me is a deal breaker.

————————

Any option I haven't covered?

What do people prefer? After writing all of them out, my preferences oscillate between the SKIP and new name, new import.

I’ll make one more note about the SKIP: one option is to not release 1.0 for another full year, even two: ie we keep the warning versions for longer, together with 1.XrcY. This should give ample warning and time for people to either pin or migrate.

On 21 Jul 2021, at 7:56 am, Stefan van der Walt <stefanv@berkeley.edu> wrote:

On Tue, Jul 20, 2021, at 14:41, Riadh wrote:
I found this presentation from Hinsen https://calcul.math.cnrs.fr/attachments/spip/IMG/pdf/cours_reproductibilite.pdf , correct me if I am wrong, but I found nothing in contradiction with the proposed SKIP, please see 3rd bullet in slide 14. My understanding of Hinsen rule is less forbiding API breaks then saving dependencies version numbers...

Let's make the distinction clear:

An API break is when you do something like this:

v0: foo(x, rescale=True) -> out
v1: foo(x, rescale=True) -> error, rescale is not a valid keyword argument

The "Hinsen rule" (which is just a handle on the following concept; we could just as well call it "silent behavioral changes") is:

v0: foo(x) == y
v1: foo(x) != y

This last behavior change is a big problem, and one that should be avoided at all costs. The prior is not so serious, because users get feedback when their code fails. Once they fix it up, it works again and works correctly.

If you do not warn your users when behavior changes, then their code can deliver the wrong results without them knowing---and this is what we need to avoid.

Stéfan

P.S. With our existing deprecation mechanism, it is possible to hit a point in the future where we accidentally have the same API calls return different results. We should be cognizant of that failure mode, and avoid it. One way is to never modify the "expected" output part of tests without careful consideration.
_______________________________________________
scikit-image mailing list -- scikit-image@python.org
To unsubscribe send an email to scikit-image-leave@python.org
https://mail.python.org/mailman3/lists/scikit-image.python.org/
Member address: jni@fastmail.com