<br><br><div class="gmail_quote">On Fri, Sep 28, 2012 at 3:23 PM, Gael Varoquaux <span dir="ltr"><<a href="mailto:gael.varoquaux@normalesup.org" target="_blank">gael.varoquaux@normalesup.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi numpy developers,<br>

<br>

First of all, thanks a lot for the hard work you put in numpy. I know<br>

very well that maintaining such a core library is a lot of effort and a<br>

service to the community. But "with great dedication, comes great<br>

responsibility" :).<br>

<br>

I find that Numpy is a bit of a wild horse, a moving target. I have just<br>

fixed a fairly nasty bug in scikit-learn [1] that was introduced by<br>

change of semantics in ordering when doing copies with numpy. I have been<br>

running working and developing the scikit-learn while tracking numpy's<br>

development tree and, as far as I can tell, I never saw warnings raised<br>

in our code that something was going to change, or had changed.<br></blockquote><div><br>IIRC, the copy order was not specced and should not have been assumed. Some copy orders are faster than others and I believe numpy now takes advantage of that fact. Admittedly, numpy has started to move, mostly due to Mark's work, but I don't think that is all bad, I feel that it has to move some and the users need to be pushed a bit. It's a balancing act, but I don't think copy order goes over the line. One way to look at that is that 1.8 might have been a better release to make the change, on the other hand, Mark has moved on and dropped into the ContinuumIO black hole. Sometimes you need to ride the train when it is there at the station.<br>

 <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

In other settings, changes in array inheritance and 'base' propagation<br>

have made impossible some of our memmap-related usecase that used to work<br>

under previous numpy [2]. Other's have been hitting difficulties related<br>

to these changes in behavior [3]. Not to mention the new casting rules<br>

(default: 'same_kind') that break a lot of code, or the ABI change that,<br>

while not done an purpose, ended up causing us a lot of pain.<br></blockquote><div><br>IIRC, the base propagation changes fixed a bug, an old bug.<br> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<br>

My point here is that having code that works and gives correct results<br>

with new releases of numpy is more challenging that it should be. I<br>

cannot claim that I disagree with the changes that I mention above. They<br>

were all implemented for a good reason and can all be considered as<br>

overall improvements to numpy. However the situation is that given a<br>

complex codebase relying on numpy that works at a time t, the chances<br>

that it works flawlessly at time t + 1y are thin. I am not too proud that<br>

we managed to release scikit-learn 0.12 with a very ugly bug under numpy<br>

1.7. That happened although we have 90% of test coverage, buildbots under<br>

different numpy versions, and a lot of people, including me, using our<br>

development tree on a day to day basis with bleeding edge numpy. Most<br>

code in research settings or RD industry does not benefit from such<br>

software engineering and I believe is much more likely to suffer from<br>

changes in numpy.<br></blockquote><div><br>If the behaviour is not specified and tested, there is no guarantee that it will continue.<br> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<br>

I think that this is a cultural issue: priority is not given to stability<br>

and backward compatibility. I think that this culture is very much<br>

ingrained in the Python world, that likes iteratively cleaning its<br>

software design. For instance, I have the feeling that in the<br>

scikit-learn, we probably fall in the same trap. That said, such a<br>

behavior cannot fare well for a base scientific environment. People tell<br>

me that if they take old matlab code, the odds that it will still works<br>

is much higher than with Python code. As a geek, I tend to reply that we<br>

get a lot out of this mobility, because we accumulate less cruft.<br>

However, in research settings, for reproducibility reasons, ones need to<br>

be able to pick up an old codebase and trust its results without knowing<br>

its intricacies.<br></blockquote><div><br>Bitch, bitch, bitch. Look, I know you are pissed and venting a bit, but this problem could have been detected and reported 6 months ago, that is, unless it is new due to development on your end. <br>

 <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

>From a practical standpoint, I believe that people implementing large<br>

changes to the numpy codebase, or any other core scipy package, should<br>

think really hard about their impact. I do realise that the changes are<br>

discussed on the mailing lists, but there is a lot of activity to follow<br>

and I don't believe that it is possible for many of us to monitor the<br>

discussions. Also, putting more emphasis on backward compatibility is<br>

possible. For instance, the 'order' parameter added to np.copy could have<br>

defaulted to the old behavior, 'K', for a year, with a<br>

DeprecationWarning, same thing for the casting rules.<br>

<br>

Thank you for reading this long email. I don't mean it to be a complaint<br>

about the past, but more a suggestion on something to keep in mind when<br>

making changes to core projects.<br>

<br></blockquote><div><br>Chuck <br></div></div>