[Numpy-discussion] big-bangs versus incremental improvements (was: Re: SciPy 2014 BoF NumPy Participation)

Charles R Harris charlesr.harris at gmail.com
Thu Jun 5 09:51:56 EDT 2014

On Thu, Jun 5, 2014 at 6:40 AM, David Cournapeau <cournape at gmail.com> wrote:

> On Thu, Jun 5, 2014 at 3:36 AM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>> On Wed, Jun 4, 2014 at 7:29 PM, Travis Oliphant <travis at continuum.io>
>> wrote:
>>> Believe me, I'm all for incremental changes if it is actually possible
>>> and doesn't actually cost more.  It's also why I've been silent until now
>>> about anything we are doing being a candidate for a NumPy 2.0.  I
>>> understand the challenges of getting people to change.  But, features and
>>> solid improvements *will* get people to change --- especially if their new
>>> library can be used along with the old library and the transition can be
>>> done gradually. Python 3's struggle is the lack of features.
>>> At some point there *will* be a NumPy 2.0.   What features go into NumPy
>>> 2.0, how much backward compatibility is provided, and how much porting is
>>> needed to move your code from NumPy 1.X to NumPy 2.X is the real user
>>> question --- not whether it is characterized as "incremental" change or
>>> "re-write".     What I call a re-write and what you call an
>>> "incremental-change" are two points on a spectrum and likely overlap
>>> signficantly if we really compared what we are thinking about.
>>> One huge benefit that came out of the numeric / numarray / numpy
>>> transition that we mustn't forget about was actually the extended buffer
>>> protocol and memory view objects.  This really does allow multiple array
>>> objects to co-exist and libraries to use the object that they prefer in a
>>> way that did not exist when Numarray / numeric / numpy came out.    So, we
>>> shouldn't be afraid of that world.   The existence of easy package managers
>>> to update environments to try out new features and have applications on a
>>> single system that use multiple versions of the same library is also
>>> something that didn't exist before and that will make any transition easier
>>> for users.
>>> One thing I regret about my working on NumPy originally is that I didn't
>>> have the foresight, skill, and understanding to work more on a more
>>> extended and better designed multiple-dispatch system so that multiple
>>> array objects could participate together in an expression flow.   The
>>> __numpy_ufunc__ mechanism gives enough capability in that direction that it
>>> may be better now.
>>> Ultimately, I don't disagree that NumPy can continue to exist in
>>> "incremental" change mode ( though if you are swapping out whole swaths of
>>> C-code for Cython code --- it sounds a lot like a "re-write") as long as
>>> there are people willing to put the effort into changing it.   I think this
>>> is actually benefited by the existence of other array objects that are
>>> pushing the feature envelope without the constraints --- in much the same
>>> way that the Python standard library is benefitted by many versions of
>>> different capabilities being tried out before moving into the standard
>>> library.
>>> I remain optimistic that things will continue to improve in multiple
>>> ways --- if a little "messier" than any of us would conceive individually.
>>>   It *is* great to see all the PR's coming from multiple people on NumPy
>>> and all the new energy around improving things whether great or small.
>> @nathaniel IIRC, one of the objections to the missing values work was
>> that it changed the underlying array object by adding a couple of variables
>> to the structure. I'm willing to do that sort of thing, but it would be
>> good to have general agreement that that is acceptable.
> I think changing the ABI for some versions of numpy (2.0 , whatever) is
> acceptable. There is little doubt that the ABI will need to change to
> accommodate a better and more flexible architecture.
> Changing the C API is more tricky: I am not up to date to the changes from
> the last 2-3 years, but at that time, most things could have been changed
> internally without breaking much, though I did not go far enough to
> estimate what the performance impact could be (if any).
>> As to blaze/dynd, I'd like to steal bits here and there, and maybe in the
>> long term base numpy on top of it with a compatibility layer. There is a
>> lot of thought and effort that has gone into those projects and we should
>> use what we can. As is, I think numpy is good for another five to ten years
>> and will probably hang on for fifteen, but it will be outdated by the end
>> of that period. Like great whites, we need to keep swimming just to have
>> oxygen. Software projects tend to be obligate ram ventilators.
>> The Python 3 experience is definitely something we want to avoid. And
>> while blaze does big data and offers some nice features, I don't know that
>> it offers compelling reasons to upgrade to the more ordinary user at this
>> time, so I'd like to sort of slip it into numpy if possible.
>> If we do start moving numpy forward in more radical steps, we should try
>> to have some agreement beforehand as to what sort of changes are
>> acceptable. For instance, to maintain backward compatibility, is it
>> sufficient that a recompile will do the job, or do we require forward
>> compatibility for extensions compiled against earlier releases? Do we stay
>> with C or should we support C++ code with its advantages of smart pointers,
>> exception handling, and templates? We will need a certain amount of
>> flexibility going forward and we should decide, or at least discuss, such
>> issues up front.
> Last time the C++ discussion was brought up, no consensus could be made. I
> think quite a few radical changes can be made without that consensus
> already, though other may disagree there.
> IMO, what is needed the most is refactoring the internal to extract the
> Python C API low level from the rest of the code, as I think that's the
> main bottleneck to get more contributors (or get new core features more
> quickly).
What do you mean by "extract the Python C API"?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140605/8219395a/attachment.html>

More information about the NumPy-Discussion mailing list