Hi Juan,

On Tue, Jul 20, 2021, at 18:27, Juan Nunez-Iglesias wrote:

Everyone agrees breaking the Hinsen rule (“avoid silent behavioural changes”) is bad. Over 4 versions, scikit-image has actually broken it a few times already. We certainly don’t want to break it en-masse with 1.0.

Yes, we shouldn't do that (or have done that) 😬

Riadh thinks, correct me if I’m wrong Riadh, that breaking it for 1.0 is ok, *especially* given the 0.20 warning. To be honest, one thing I like about the 0.20 warning is that it will *teach* people to pay attention to version numbers. The other plans, not so much. And these are important not just in this skimage transition but throughout the ecosystem. The Hinsen rule is broken dozens of times across the ecosystem. Even NumPy allows this over “long” deprecation periods. (See the copy=’never’ discussion.) But, back to summaries.

Maybe you can teach people just as much this way. You can, e.g., have `import skimage` with 2.0 installed raise an error that explains exactly what's going on ("you should install skimage2 and switch to the new skimage2 API").

NumPy is not breaking the Hinsen-rule with the `copy='never'` decision (hence the long discussion). Currently, `True` means always copy, and `False` means `copy if necessary`. If it gets changed, then False will be *more strict* (i.e., no invalid results) and if the enum is used then there will be no silent regression. A concern raised in that discussion is that *newer* code might break on *older* versions of NumPy, which is an important consideration but a much less common scenario than the other way around.

A different issue is API breakage. This is not as bad as the Hinsen rule but it can also be bad for user goodwill.

I agree that we should try and minimize breakage as far as possible, but also use the new opportunity to ensure that everything is consistent---this will benefit us and the users greatly in the long run.

- the SKIP:

old API import: unavailable at 1.0
new API import: skimage
old API package: scikit-image <1.*
new API package: scikit-image 1.*

Pros:
* uses semver correctly
* with enough warning, lets users pin their dependencies intentionally, improving the reproducibility of their packages
Cons:
* users who don’t run their code in the transition period won’t be warned
* if we don’t break the API enough, risks breaking Hinsen rule
* if we break it completely, risks annoying users

It feels here as though we are saying: we need to break the API enough so users know what's going on, but not so much that we drive them insane. That is a fine balance, and it may be easier to be explicit.

If the principle is that we want to ensure that user code from a while ago will run correctly or break, then it makes it easier to rule out some options.

After all, two of our values are: (a) consistent API and (b) ensuring correctness.

- the frozen package:

old API import: skimage
new API import: skimage.v1
old API package = new API package = scikit-image 1.*

Pros:
* lets anyone migrate to new API at their own pace
* existing code and (more important, imho) existing StackOverflow answers etc continue to work
Cons:
* no pressure for anyone to move to new API
* could take years to get people to migrate, splintering the community
* ability to mix code between APIs could give *very* confusing results

Maintaining two versions of the API in one package, while borrow to-and-fro, is a complex operation to get right. If you modify the newer version, you have to test very carefully to ensure that the older version doesn't change as well. If you only backport bugfixes to a separate package, that becomes a lot easier (if a bit labor intensive).

- the new name, new import package:

old API import: skimage
new API import: skimage2
old API package: scikit-image (any version)
new API package: skimage2 (any version)

This is my preference (although I don't care so much about the new API package name).

Pros:
* Clear distinction between APIs both on the dependency level and import level
* Clear when reading someone’s code what version they are using
* no risk of breaking the Hinsen rule
* FINALLY, our package name matches our import name 🎉
Cons:
* marginally more annoying import
* confusing for package managers — see e.g. ‘conda install pyqt’ vs ‘pip install pyqt5’

If it is too annoying we can always keep `scikit-image` as the install name. Although I think it's easier to communicate with packagers than it is with users (there are fewer of them, at least :).

* potentially slow migration as users might not quickly become aware of skimage2

I wonder if a warning on import of scikit-image 1.x, telling users about skimage2, is considered bad form?

* what do we do for subsequent versions? e.g. opencv-python is at version 4.5 but imports as cv2 🤦‍♂️

skimage v2.199.0 until we need another refactor ? :)

I’ll make one more note about the SKIP: one option is to not release 1.0 for another full year, even two: ie we keep the warning versions for longer, together with 1.XrcY. This should give ample warning and time for people to either pin or migrate.

We can always hope to communicate changes adequately. But the most certain way of doing so is to let the code do the talking. If the code doesn't work, and lets the user know why, no-one will mistakenly use the wrong package. No-one has to read release notes/mailing lists, reinstall in a certain time-frame, or accidentally upgrade to the tripwire version.

I also fear what will happen if beginner users run into the pinning-or-migrate solution. The simpler the technical solution we can come up with, the less likely that it will trip up our users.

Stéfan