Re: [Numpy-discussion] The future of ndarray.diagonal()

Jan. 4, 2015

      On 03/01/15 20:49, Nathaniel Smith wrote:
...
The post you cite brings this up explicitly:
...
[3] http://khinsen.wordpress.com/2014/09/12/the-state-of-numpy/
I have huge respect for the problems and pain that Konrad describes in
this blog post, but I really can't agree with the argument or the
conclusions. His conclusion is that when it comes to compatibility
breaks, slow-incremental-change is bad, and that we should instead
prefer big all-at-once compatibility breaks like the Numeric->Numpy or
Py2->Py3 transitions. But when describing his own experiences that he
uses to motivate this, he says:
...
There are two different scenarios to consider here, and perhaps I didn't 
make that distinction clear enough. One scenario is that of a maintained 
library or application that depends on NumPy. The other scenario is a 
set of scripts written for a specific study (let's say a thesis) that is 
then published for its documentation value. Such scripts are in general 
not maintained.

In the first scenario, gradual API changes work reasonably well, as long 
as the effort involved in applying the fixes are sufficiently minor that 
developers can integrate them into their routine maintenance efforts. 
That is the situation I have described for my own past experience as a 
library author.

It's the second scenario where gradual changes are a real problem. 
Suppose I have a set of scripts from a thesis published in year X, and I 
need to understand them in detail in year X+5 for a related scientific 
project. If the script produces different results with NumPy 1.7 and 
NumPy 1.10, which result should I assume the author intended? People 
rarely write down which versions of all dependencies they used. Yes, 
they should, but it's actually a pain to do this, in particular when you 
work on multiple machines and don't manage the Python installation 
yourself. In this rather frequent situation, the published scripts are 
ambiguous - I can't really know what they did when the author ran them.

There is a third scenario where this problem shows up: outdated legacy 
system installations, which are particularly frequent on clusters and 
supercomputer. For example, the cluster at my lab runs a CentOS version 
that is a few years old. CentOS is known for its conservatism, and 
therefore the default Python installation on that machine is based on 
Python 2.6 with correspondingly old NumPy versions. People do install 
recent application libraries there. Suppose someone runs code there that 
assumes the future semantics for diagonal() - this will silently yield 
wrong results.

In summary, my point of view on breaking changes is

1) Changes that can make legacy code fail can be introduced gradually. 
The right compromise between stability and progress must be figured out 
by the community.

2) Changes that yield to different results for unmodified legacy code 
should never be allowed.

3) The best overall solution for API evolution is a version number 
visible in client code with a version number change whenever some 
breaking change is introduced. This guarantees point 2).

So if the community decides that it is important to change the behavior 
of diagonal(), this should be done in one of two ways:

a) Deprecate diagonal() and introduce a differently-named method with 
the new functionality. This will make old code fail rather than produce 
wrong results.

b) Accumulate this change with other such changes and call the new API 
"numpy2".

Konrad.

Re: [Numpy-discussion] The future of ndarray.diagonal()

Konrad Hinsen