[Numpy-discussion] Making numpy sensible: backward compatibility please

Fri Sep 28 17:23:27 EDT 2012

Hi numpy developers,

First of all, thanks a lot for the hard work you put in numpy. I know
very well that maintaining such a core library is a lot of effort and a
service to the community. But "with great dedication, comes great
responsibility" :).

I find that Numpy is a bit of a wild horse, a moving target. I have just
fixed a fairly nasty bug in scikit-learn [1] that was introduced by
change of semantics in ordering when doing copies with numpy. I have been
running working and developing the scikit-learn while tracking numpy's
development tree and, as far as I can tell, I never saw warnings raised
in our code that something was going to change, or had changed.

In other settings, changes in array inheritance and 'base' propagation
have made impossible some of our memmap-related usecase that used to work
under previous numpy [2]. Other's have been hitting difficulties related
to these changes in behavior [3]. Not to mention the new casting rules
(default: 'same_kind') that break a lot of code, or the ABI change that,
while not done an purpose, ended up causing us a lot of pain.

My point here is that having code that works and gives correct results
with new releases of numpy is more challenging that it should be. I
cannot claim that I disagree with the changes that I mention above. They
were all implemented for a good reason and can all be considered as
overall improvements to numpy. However the situation is that given a
complex codebase relying on numpy that works at a time t, the chances
that it works flawlessly at time t + 1y are thin. I am not too proud that
we managed to release scikit-learn 0.12 with a very ugly bug under numpy
1.7. That happened although we have 90% of test coverage, buildbots under
different numpy versions, and a lot of people, including me, using our
development tree on a day to day basis with bleeding edge numpy. Most
code in research settings or RD industry does not benefit from such
software engineering and I believe is much more likely to suffer from
changes in numpy.

I think that this is a cultural issue: priority is not given to stability
and backward compatibility. I think that this culture is very much
ingrained in the Python world, that likes iteratively cleaning its
software design. For instance, I have the feeling that in the
scikit-learn, we probably fall in the same trap. That said, such a
behavior cannot fare well for a base scientific environment. People tell
me that if they take old matlab code, the odds that it will still works
is much higher than with Python code. As a geek, I tend to reply that we
get a lot out of this mobility, because we accumulate less cruft.
However, in research settings, for reproducibility reasons, ones need to
be able to pick up an old codebase and trust its results without knowing
its intricacies.

>From a practical standpoint, I believe that people implementing large
changes to the numpy codebase, or any other core scipy package, should
think really hard about their impact. I do realise that the changes are
discussed on the mailing lists, but there is a lot of activity to follow
and I don't believe that it is possible for many of us to monitor the
discussions. Also, putting more emphasis on backward compatibility is
possible. For instance, the 'order' parameter added to np.copy could have
defaulted to the old behavior, 'K', for a year, with a
DeprecationWarning, same thing for the casting rules.

Thank you for reading this long email. I don't mean it to be a complaint
about the past, but more a suggestion on something to keep in mind when
making changes to core projects.

Cheers,

Gaël

____

[1] https://github.com/scikit-learn/scikit-learn/commit/7842748cf777412c506a8c0ed28090711d3a3783

[2] http://mail.scipy.org/pipermail/numpy-discussion/2012-September/063985.html

[3] http://mail.scipy.org/pipermail/numpy-discussion/2012-July/063126.html