[Numpy-discussion] big-bangs versus incremental improvements (was: Re: SciPy 2014 BoF NumPy Participation)

Nathaniel Smith njs at pobox.com
Thu Jun 5 18:41:33 EDT 2014

On Thu, Jun 5, 2014 at 3:36 AM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
> @nathaniel IIRC, one of the objections to the missing values work was that
> it changed the underlying array object by adding a couple of variables to
> the structure. I'm willing to do that sort of thing, but it would be good to
> have general agreement that that is acceptable.

I can't think of reason why adding new variables to the structure *per
se* would be objectionable to anyone? IIRC the objection you're
thinking of wasn't to the existence of new variables, but to their
effect on compatibility: their semantics meant that every piece of
legacy C code that worked with ndarrays had to be updated to check for
the new variables before it could correctly interpret the ->data
field, and if it wasn't updated it would just return incorrect
results. And there wasn't really a clear story for how we were going
to detect and fix all this legacy code. This specific kind of
compatibility break does seem pretty objectionable, but that's because
of the details of its behaviour, not because variables in general are
problematic, I think.

> As to blaze/dynd, I'd like to steal bits here and there, and maybe in the
> long term base numpy on top of it with a compatibility layer. There is a lot
> of thought and effort that has gone into those projects and we should use
> what we can. As is, I think numpy is good for another five to ten years and
> will probably hang on for fifteen, but it will be outdated by the end of
> that period. Like great whites, we need to keep swimming just to have
> oxygen. Software projects tend to be obligate ram ventilators.

I worry a bit that this could become a self-fulfilling prophecy.
Plenty of software survives longer than that; the Linux kernel hasn't
had a "real" major number increase [1] since 2.6.0, more than 10 years
ago, and it's still an extraordinarily vital project. Partly this is
because they have resources we don't etc., but partly it's just
because they've decided that incremental change is how they're going
to do things, and approached each new feature with that in mind. And
in ten years they haven't yet found any features that required a
serious compatibility break.

This is a pretty minor worry though -- we don't have to agree about
what will happen in 10 years to agree about what to do now :-).

[1] http://www.pcmag.com/article2/0,2817,2388926,00.asp

> The Python 3 experience is definitely something we want to avoid. And while
> blaze does big data and offers some nice features, I don't know that it
> offers compelling reasons to upgrade to the more ordinary user at this time,
> so I'd like to sort of slip it into numpy if possible.
> If we do start moving numpy forward in more radical steps, we should try to
> have some agreement beforehand as to what sort of changes are acceptable.
> For instance, to maintain backward compatibility, is it sufficient that a
> recompile will do the job, or do we require forward compatibility for
> extensions compiled against earlier releases?

I find it hard to discuss these things in general, since specific
compatibility issues usually involve complicated trade-offs -- will
every package have to recompile or just some of them, if they don't
will it be a nice error message or a segfault, is there some way we
can issue warnings ahead of time for the offending behaviour, etc.

That said, my vote is that if there's a change that (a) can't be done
some other way, (b) requires a recompile, (c) doesn't cause segfaults
but rather produces some sensible error message like "ABI mismatch
please recompile", (d) is a change that's worth the bother (this
determination to include at least canvassing the list to check that
users in general agree that it's worth it), then yeah we should do it.
I don't anticipate that this will happen very often given how far
we've gotten without it, but yeah.

It's possible we should be making a fuss now on distutils-sig about
handling these cases in the brave new world of wheels, so that the
relevant features have some chance of existing by the time we need
them (e.g., 'pip upgrade numpy' should become smart enough to detect
when this necessitates an upgrade of scipy).

> Do we stay with C or should we
> support C++ code with its advantages of smart pointers, exception handling,
> and templates? We will need a certain amount of flexibility going forward
> and we should decide, or at least discuss, such issues up front.

This is an easier question, since it doesn't affect end-users at all
(at least, so long as they have a decent toolchain available, but
scipy already requires C++). Personally I'd like to see a more
concrete plan for how exactly C++ would be used and why it's better
than alternatives (as mentioned I have the vague idea that Cython
would be even better), but I can't see why we should rule it out up
front either.


Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh

More information about the NumPy-Discussion mailing list