[Numpy-discussion] How a transition to C++ could work

David Cournapeau cournape at gmail.com
Sun Feb 19 03:56:00 EST 2012


Hi Mark,

thank you for joining this discussion.

On Sun, Feb 19, 2012 at 7:18 AM, Mark Wiebe <mwwiebe at gmail.com> wrote:
> The suggestion of transitioning the NumPy core code from C to C++ has
> sparked a vigorous debate, and I thought I'd start a new thread to give my
> perspective on some of the issues raised, and describe how such a transition
> could occur.
>
> First, I'd like to reiterate the gcc rationale for their choice to switch:
> http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale
>
> In particular, these points deserve emphasis:
>
> The C subset of C++ is just as efficient as C.
> C++ supports cleaner code in several significant cases.
> C++ makes it easier to write cleaner interfaces by making it harder to break
> interface boundaries.
> C++ never requires uglier code.

I think those arguments will not be very useful: they are subjective,
and unlikely to convince people who prefer C to C++.

>
> There are concerns about ABI/API interoperability and interactions with C++
> exceptions. I've dealt with these types of issues on enough platforms to
> know that while they're important, they're a lot easier to handle than the
> issues with Fortran, BLAS, and LAPACK in SciPy. My experience has been that
> providing a C API from a C++ library is no harder than providing a C API
> from a C library.

This needs more details. I have some experience in both areas as well,
and mine is quite different. Reiterating a few examples that worry me:
  - how can you ensure that exceptions happening in C++ will never
cross different .so/.dll ? How can one make sure C++ extensions built
by different compilers can work ? Is not using exceptions like it is
done in zeromq acceptable ? (would be nice to find out more about the
decisions made by the zeromq team about their usage of C++). I cannot
find a recent example, but I have seen errors similar to
this(http://software.intel.com/en-us/forums/showthread.php?t=42940)
quite a few times.
  - how can you expose in C some heavily-using C++ features ? I would
expect you would like to use templates for iterators in numpy - you
can you make them available to 3rd party extensions without requiring
C++.

>
> It's worth comparing the possibility of C++ versus the possibility of other
> languages, and the ones that have been suggested for consideration are D,
> Cython, Rust, Fortran 2003, Go, RPython, C# and Java. The target language
> has to interact naturally with the CPython API. It needs to provide direct
> access to all the various sizes of signed int, unsigned int, and float. It
> needs to have mature compiler support wherever we want to deploy NumPy.
> Taken together, these requirements eliminate a majority of these
> possibilities. From these criteria, the only languages which seem to have a
> clear possibility for the implementation of Numpy are C, C++, and D. For D,
> I suspect the tooling is not mature enough, but I'm not 100% certain of
> that.

While I agree that no other language is realistic, staying in C has
the nice advantage that we can more easily use one of them if they
mature (rust/D - go, rpython, C#/java can be dismissed for fundamental
technical reasons right away). This is not a very strong argument
against using C++, obviously.

>
> 1) Immediately after branching for 1.7, we minimally patch all the .c files
> so that they can build with a C++ compiler and with a C compiler at the same
> time. Then we rename all .c -> .cpp, and update the build systems for C++.
> 2) During the 1.8 development cycle, we heavily restrict C++ feature usage.
> But, where a feature implementation would be arguably easier and less
> error-prone with C++, we allow it. This is a period for learning about C++
> and how it can benefit NumPy.
> 3) After the 1.8 release, the community will have developed more experience
> with C++, and will be in a better position to discuss a way forward.

A step that would be useful sooner rather than later is one where
numpy has been split into smaller extensions (instead of
multiarray/ufunc, essentially). This would help avoiding recompilation
of lots of code for any small change. It is already quite painful with
C, but with C++, it will be unbearable. This can be done in C, and
would be useful whether the decision to move to C++ is accepted or
not.

cheers,

David



More information about the NumPy-Discussion mailing list