
On 03/01/15 20:49, Nathaniel Smith wrote:
The post you cite brings this up explicitly:
[3] http://khinsen.wordpress.com/2014/09/12/the-state-of-numpy/
I have huge respect for the problems and pain that Konrad describes in this blog post, but I really can't agree with the argument or the conclusions. His conclusion is that when it comes to compatibility breaks, slow-incremental-change is bad, and that we should instead prefer big all-at-once compatibility breaks like the Numeric->Numpy or Py2->Py3 transitions. But when describing his own experiences that he uses to motivate this, he says: ...
There are two different scenarios to consider here, and perhaps I didn't make that distinction clear enough. One scenario is that of a maintained library or application that depends on NumPy. The other scenario is a set of scripts written for a specific study (let's say a thesis) that is then published for its documentation value. Such scripts are in general not maintained. In the first scenario, gradual API changes work reasonably well, as long as the effort involved in applying the fixes are sufficiently minor that developers can integrate them into their routine maintenance efforts. That is the situation I have described for my own past experience as a library author. It's the second scenario where gradual changes are a real problem. Suppose I have a set of scripts from a thesis published in year X, and I need to understand them in detail in year X+5 for a related scientific project. If the script produces different results with NumPy 1.7 and NumPy 1.10, which result should I assume the author intended? People rarely write down which versions of all dependencies they used. Yes, they should, but it's actually a pain to do this, in particular when you work on multiple machines and don't manage the Python installation yourself. In this rather frequent situation, the published scripts are ambiguous - I can't really know what they did when the author ran them. There is a third scenario where this problem shows up: outdated legacy system installations, which are particularly frequent on clusters and supercomputer. For example, the cluster at my lab runs a CentOS version that is a few years old. CentOS is known for its conservatism, and therefore the default Python installation on that machine is based on Python 2.6 with correspondingly old NumPy versions. People do install recent application libraries there. Suppose someone runs code there that assumes the future semantics for diagonal() - this will silently yield wrong results. In summary, my point of view on breaking changes is 1) Changes that can make legacy code fail can be introduced gradually. The right compromise between stability and progress must be figured out by the community. 2) Changes that yield to different results for unmodified legacy code should never be allowed. 3) The best overall solution for API evolution is a version number visible in client code with a version number change whenever some breaking change is introduced. This guarantees point 2). So if the community decides that it is important to change the behavior of diagonal(), this should be done in one of two ways: a) Deprecate diagonal() and introduce a differently-named method with the new functionality. This will make old code fail rather than produce wrong results. b) Accumulate this change with other such changes and call the new API "numpy2". Konrad.