[Numpy-discussion] Proposed Roadmap Overview

Sat Feb 18 20:18:21 EST 2012

Hi,

On Sat, Feb 18, 2012 at 2:54 PM, Travis Oliphant <travis at continuum.io> wrote:
>
> On Feb 18, 2012, at 4:03 PM, Matthew Brett wrote:
>
>> Hi,
>>
>> On Sat, Feb 18, 2012 at 1:57 PM, Travis Oliphant <travis at continuum.io> wrote:
>>> The C/C++ discussion is just getting started.  Everyone should keep in mind
>>> that this is not something that is going to happening quickly.   This will
>>> be a point of discussion throughout the year.    I'm not a huge supporter of
>>> C++, but C++11 does look like it's made some nice progress, and as I think
>>> about making a core-set of NumPy into a library that can be called by
>>> multiple languages (and even multiple implementations of Python), tempered
>>> C++ seems like it might be an appropriate way to go.
>>
>> Could you say more about this?  Do you have any idea when the decision
>> about C++ is likely to be made?  At what point does it make most sense
>> to make the argument for or against?  Can you suggest a good way for
>> us to be able to make more substantial arguments either way?
>
> I think early arguments against are always appropriate --- if you believe they have a chance of swaying Mark or Chuck who are the strongest supporters of C++ at this point.     I will be quite nervous about going crazy with C++.   It was suggested that I use C++ 7 years ago when I wrote NumPy.   I didn't go that route then largely because of compiler issues,  ABI-concerns, and I knew C better than C++ so I felt like it would have taken me longer to do something in C++.     I made the right decision for me.   If you think my C-code is horrible, you would have been completely offended by whatever C++ I might have done at the time.
>
> But I basically agree with Chuck that there is a lot of C-code in NumPy and template-based-code that is really trying to be C++ spelled differently.
>
> The decision will not be made until NumPy 2.0 work is farther along.     The most likely outcome is that Mark will develop something quite nice in C++ which he is already toying with, and we will either choose to use it in NumPy to build 2.0 on --- or not.   I'm interested in sponsoring Mark and working as closely as I can with he and Chuck to see what emerges.

Would it be fair to say then, that you are expecting the discussion
about C++ will mainly arise after the Mark has written the code?   I
can see that it will be easier to specific at that point, but there
must be a serious risk that it will be too late to seriously consider
an alternative approach.

>> Can you say a little more about your impression of the previous Cython
>> refactor and why it was not successful?
>>
>
> Sure.  This list actually deserves a long writeup about that.   First, there wasn't a "Cython-refactor" of NumPy.   There was a Cython-refactor of SciPy.   I'm not sure of it's current status.   I'm still very supportive of that sort of thing.

I think I missed that - is it on git somewhere?

> I don't know if Cython ever solved the "raising an exception in a Fortran-called call-back" issue.   I used setjmp and longjmp in several places in SciPy originally in order to enable exceptions raised in a Python-callback that is wrapped in a C-function pointer and being handed to a Fortran-routine that asks for a function-pointer.
>
> What happend in NumPy, was that the code was re-factored to become a library.   I don't think much NumPy code actually ended up in Cython (the random-number generators have been in Cython from the beginning).
>
>
> The biggest problem with merging the code was that Mark Wiebe got active at about that same time :-)   He ended up changing several things in the code-base that made it difficult to merge-in the changes.   Some of the bug-fixes and memory-leak patches, and tests did get into the code-base, but the essential creation of the NumPy library did not make it.   There was some very good work done that I hope we can still take advantage of.

> Another factor.   the decision to make an extra layer of indirection makes small arrays that much slower.   I agree with Mark that in a core library we need to go the other way with small arrays being completely allocated in the data-structure itself (reducing the number of pointer de-references

Does that imply there was a review of the refactor at some point to do
things like benchmarking?   Are there any sources to get started
trying to understand the nature of the Numpy refactor and where it ran
into trouble?  Was it just the small arrays?

> So, Cython did not play a major role on the NumPy side of things.   It played a very nice role on the SciPy side of things.

I guess Cython was attractive because the desire was to make a
stand-alone library?   If that is still the goal, presumably that
excludes Cython from serious consideration?  What are the primary
advantages of making the standalone library?  Are there any serious
disbenefits?

Thanks a lot for the reply,

Matthew