On Sat, Jul 21, 2018 at 7:15 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Sat, Jul 21, 2018 at 4:48 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
Hi all,
Here is a first draft of a NEP on backwards compatibility and deprecation policy. This I think mostly formalized what we've done for the last couple of years, however I'm sure opinions and wish lists will differ here.
Oh *awesome*, thanks for putting this together.
I think this is a great start, but I'd structure it a bit differently. So let me just make a few high-level comments first and see what you think.
Regarding the "general principles" and then "policy": to me these feel like more a brainstorming list, that hasn't been fully distilled down into principles yet. I would try to structure it to start with the overarching principles (changes need to benefit users more than they harm them, numpy is widely used so breaking changes should by default be assumed to be fairly harmful, decisions should be based on data and actual effects on users rather than e.g. appealing to the docs or abstract aesthetic principles, silently getting wrong answer is much worse than a loud error), then talk about some of the ways this plays out (if people are currently silently getting the wrong answer -- which is the definition of a bug, but also shows up in the index-by-float case -- then that's really bad; some of our tools for collecting data about how bad a breakage is include testing prominent downstreams ourselves, adding warnings or making .0 releases and seeing how people react, etc.), and then examples.
Thanks, I'll try and rework the general principles, you have some excellent points in here.
Speaking of examples: I hate to say this because in general I think using examples is a great idea. But... I think you should delete most of these examples. The problem is scope creep: the goal for this NEP (IMO) should be to lay out the principles we use to think about these issues in general, but right now it comes across as trying to lay down a final resolution on lots of specific issues (including several where there are ongoing conversations). It ends up like trying to squish multiple NEPs into one, which makes it hard to discuss, and also distracts from the core purpose.
I'm not sure this is the best thing to do. I can remove a couple, but aiming to be "totally uncontroversial" is almost impossible given the topic of the NEP. The diag view example is important I think, it's the second most discussed backwards compatibility issue next to histogram. I'm happy to remove the statement on what should happen with it going forward though. Then, I think it's not unreasonable to draw a couple of hard lines. For example, removing complete submodules like linalg or random has ended up on some draft brainstorm roadmap list because someone (no idea who) put it there after a single meeting. Clearly the cost-benefit of that is such that there's no point even discussing that more, so I'd rather draw that line here than every time someone open an issue. Very recent example: https://github.com/numpy/numpy/issues/11457 (remove auto-import of numpy.testing).
My suggestion: keep just two examples, histogram and indexing-with-floats. These are safely done and dusted, totally uncontroversial (AFAIK), and the first is a good illustration of how one can try to be careful and do the right thing but still get it all wrong, while the second is a good example of (a) how we gathered data and decided that an actually pretty disruptive change was nonetheless worth it, and (b) how we had to manage it through multiple false starts.
Regarding the actual policy: One alteration to current practice jumped out at me. This policy categorically rules out all changes that could cause currently working code to silently start doing something wrong, regardless of the specific circumstances. That's not how we actually do things right now. Instead, our policy in recent years has been that such changes are permitted in theory, but (a) the starting presumption is that this is super harmful to users so we need a *very* good reason to do it, and (b) if we do go ahead with it, then during the deprecation period we use a highly-visible FutureWarning (instead of the invisible-by-default DeprecationWarning).
Personally I think the current policy strikes a better balance. You can see some examples of where we've used this by running 'git log -S FUTUREWARNING -S FutureWarning' -- it's things like a bad default for the rcond argument in lstsq, an obscure and error-prone corner case in indexing (0addc016ba), strange semantics for NaT (https://mail.scipy.org/pipermail/numpy-discussion/ 2015-October/073968.html), ... we could quibble about individual cases, but I think that taking these on a case-by-case basis is better than ruling them out categorically. And in any case, that is what we do now, so if you want to change this, it's something we should discuss and probably write down some rationale and such :-).
You're right here. Thanks for the examples. I'll update this according to your suggestion, and propose to use one of the examples (rcond probably) to illustrate.
Regarding the major version number thing: ugh do we really want to talk about this more. I'd probably leave it out of the NEP entirely. If it stays in, I think it needs a clearer description of what counts as a "major" change.
I think it has value to keep it, and that it's not really possible to come up with a very clear description of "major". In particular, I'd like every deprecation message to say "this deprecated feature will be removed by release X.Y.0". At the moment we don't do that, so if users see a message they don't know if a removal will happen next year, in the far future (2.0), or never. The major version thing is quite useful to signal our intent. Doesn't mean we need to exhaustively discuss when to do a 2.0 though, I agree that that's not a very useful discussion right now. Happy to remove this though if people don't like it. Other opinions? Cheers, Ralf
There are some examples of things that do "sound" major, but... the rest of our policy is all about measuring disruption based on effects on users, and by that metric, the index-by-float removal was pretty major. My guess is that by the time we finally remove np.matrix, the actual disruption will be less than it was for removing index-by-float. (As it should be, since keeping index-by-float around was actively causing bugs in even well-maintained downstreams, in a way that np.matrix doesn't.)