[Numpy-discussion] Going toward time-based release ?

Stéfan van der Walt stefan at sun.ac.za
Mon May 12 03:09:39 EDT 2008


2008/5/12 Jarrod Millman <millman at berkeley.edu>:
> I agree, but we also have the problem that we don't have adequate
> tests.  I don't think we realized the extent to which MaskedArrays
> necessitated code rewrites until Charles Doutriaux pointed out how
> much difficulty the change was causing to their code base.

There's a valuable lesson to be learnt here: unit tests provide a
contract between the developer and the user.  When I did the
MaskedArray merge, I made very sure that we strictly stuck to our
contract -- the unit tests for numpy.core.ma (which ran without
failure).   Unfortunately, according to Charles' experience, that
contract was inadequate.  We shouldn't be caught with our pants down
like that.

The matplotlib guys used Pierre's maskedarray for a while before we
did the merge, so we had good reason to believe that it was a vast
improvement (and it was).  I agree that the best policy would have
been to make a point release right before the merge, but
realistically, if we had to wait for such a release, the new
maskedarrays would still not have been merged.

Which brings me to my next point: we have very limited (human)
resources, but releasing frequently is paramount.  To what extent can
we automate the release process?  I've asked this question before, but
I haven't had a clear answer: are the packages currently built
automatically?  Why don't we get the buildbots to produce nightly
snapshot packages, so that when we tell users "try the latest SVN
version, it's been fixed there" it doesn't send them into a dark
depression?

As for the NumPy unit tests: I have placed coverage reports online
(http://mentat.za.net/numpy/coverage).  This only covers Python (not
extension) code, but having that part 100% tested is not impossible,
nor would it take that much effort.  The much more important issue is
having the C extensions tested, and if anyone can figure out a way to
get gcov to generate those coverage reports, I'd be in the seventh
heaven.  Thus far, the only way I know of is to build one large,
static Python binary that includes numpy.

Memory errors:  Albert Strasheim recently changed his build client
config to run Valgrind on the NumPy code.  Surprise, surprise -- we
introduced new memory errors since the last release.  In the future,
when *any* changes are made to the the C code:

a) Add a unit test for the change, unless the test already exists (and
I suggest we *strongly* enforce this policy).
b) Document your change if it is not immediately clear what it does.
b) Run the test suite through Valgrind, or if you're not on a linux
platform, look at the buildbot (http://buildbot.scipy.org) output.

Finally, our patch acceptance process is poor.  It would be good if we
could have a more formal system for reviewing incoming and *also* our
own patches.  I know Ondrej Certik had a review board in place for
Sympy at some stage, so we could ask him what their experience was.

So, +1 for more frequent releases, +1 for more tests and +1 for good
developer discipline.

Regards
Stéfan



More information about the NumPy-Discussion mailing list