[Numpy-discussion] How a transition to C++ could work

Mark Wiebe mwwiebe at gmail.com
Sun Feb 19 05:49:29 EST 2012


On Sun, Feb 19, 2012 at 4:30 AM, Christopher Jordan-Squire
<cjordan1 at uw.edu>wrote:

> On Sun, Feb 19, 2012 at 2:14 AM, David Cournapeau <cournape at gmail.com>
> wrote:
> > On Sun, Feb 19, 2012 at 9:52 AM, Mark Wiebe <mwwiebe at gmail.com> wrote:
> >> On Sun, Feb 19, 2012 at 3:10 AM, Ben Walsh <ben_w_123 at yahoo.co.uk>
> wrote:
> >>>
> >>>
> >>>
> >>> > Date: Sun, 19 Feb 2012 01:18:20 -0600
> >>> > From: Mark Wiebe <mwwiebe at gmail.com>
> >>> > Subject: [Numpy-discussion] How a transition to C++ could work
> >>> > To: Discussion of Numerical Python <NumPy-Discussion at scipy.org>
> >>> > Message-ID:
> >>> >
> >>> > <CAMRnEmpVTmt=KduRpZKtgUi516oQtqD4vAzm746HmpqgpFXNqQ at mail.gmail.com>
> >>> > Content-Type: text/plain; charset="utf-8"
> >>> >
> >>> > The suggestion of transitioning the NumPy core code from C to C++ has
> >>> > sparked a vigorous debate, and I thought I'd start a new thread to
> give
> >>> > my
> >>> > perspective on some of the issues raised, and describe how such a
> >>> > transition could occur.
> >>> >
> >>> > First, I'd like to reiterate the gcc rationale for their choice to
> >>> > switch:
> >>> > http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale
> >>> >
> >>> > In particular, these points deserve emphasis:
> >>> >
> >>> >   - The C subset of C++ is just as efficient as C.
> >>> >   - C++ supports cleaner code in several significant cases.
> >>> >   - C++ makes it easier to write cleaner interfaces by making it
> harder
> >>> > to
> >>> >   break interface boundaries.
> >>> >   - C++ never requires uglier code.
> >>> >
> >>>
> >>> I think they're trying to solve a different problem.
> >>>
> >>> I thought the problem that numpy was trying to solve is "make inner
> loops
> >>> of numerical algorithms very fast". C is great for this because you can
> >>> write C code and picture precisely what assembly code will be
> generated.
> >>
> >>
> >> What you're describing is also the C subset of C++, so your experience
> >> applies just as well to C++!
> >>
> >>>
> >>> C++ removes some of this advantage -- now there is extra code
> generated by
> >>> the compiler to handle constructors, destructors, operators etc which
> can
> >>> make a material difference to fast inner loops. So you end up just
> writing
> >>> "C-style" anyway.
> >>
> >>
> >> This is in fact not true, and writing in C++ style can often produce
> faster
> >> code. A classic example of this is C qsort vs C++ std::sort. You may be
> >> thinking of using virtual functions in a class hierarchy, where a
> tradeoff
> >> between performance and run-time polymorphism is being done. Emulating
> the
> >> functionality that virtual functions provide in C will give similar
> >> performance characteristics as the C++ language feature itself.
> >>
> >>>
> >>> On the other hand, if your problem really is "write lots of OO code
> with
> >>> virtual methods and have it turned into machine code" (probably like
> the
> >>> GCC guys) then maybe C++ is the way to go.
> >>
> >>
> >> Managing the complexity of the dtype subsystem, the ufunc subsystem, the
> >> nditer component, and other parts of NumPy could benefit from C++ Not
> in a
> >> stereotypical "OO code with virtual methods" way, that is not how
> typical
> >> modern C++ is done.
> >>
> >>>
> >>> Some more opinions on C++:
> >>> http://gigamonkeys.wordpress.com/2009/10/16/coders-c-plus-plus/
> >>>
> >>> Sorry if this all seems a bit negative about C++. It's just been my
> >>> experience that C++ adds complexity while C keeps things nice and
> simple.
> >>
> >>
> >> Yes, there are lots of negative opinions about C++ out there, it's true.
> >> Just like there are negative opinions about C, Java, C#, and any other
> >> language which has become popular. My experience with regard to
> complexity
> >> and C vs C++ is that C forces the complexity of dealing with resource
> >> lifetimes out into all the code everyone writes, while C++ allows one to
> >> encapsulate that sort of complexity into a class which is small and more
> >> easily verifiable. This is about code quality, and the best quality C++
> code
> >> I've worked with has been way easier to program in than the best
> quality C
> >> code I've worked with.
> >
> > While I actually believe this to be true (very good C++ can be easier
> > to read/use than very good C). Good C is also much more common than
> > good C++, at least in open source.
> >
> > On the good C++ codebases you have been working on, could you rely on
> > everybody being a very good C++ programmer ? Because this will most
> > likely never happen for numpy. This is the crux of the argument from
> > an organizational POV: the variance in C++ code quality is much more
> > difficult to control. I have seen C++ code that is certainly much
> > poorer and more complex than numpy, to a point where not much could be
> > done to save the codebase.
> >
>
> Can this possibly be extended to the following: How will Mark's
> (extensive) experience about performance and long-term consequences of
> design decisions be communicated to future developers? We not only
> want new numpy developers, we want them to write good code without
> unintentional performance regressions. It seems like something more
> than just code guidelines would be required.
>

I've tried to set a bit of an example to start with the NEPs I've written.
The NEPs for both the nditer and the NA functionality are very long and
detailed. Some documents giving general code tours of NumPy would be very
helpful, however, and this kind of document could communicate both the
current code and what direction it might evolve in the future. It might be
worth creating a performance test suite to protect against performance
regressions. Wes McKinney has made some noise in that direction. (
http://wesmckinney.com/blog/?p=373)


> There's also the issue that c++ compilation error messages can be
> awful and disheartening. Are there ways of making them not as bad by
> following certain coding styles, or is that baked in? (I know clang is
> moving towards making them much better, though.)
>

Yes, this is a problem. Clang has already made this a lot better than the
status quo if you have the good fortune of using it. There are ways of
making them not as bad, the boost library developers for example have put a
lot of thought into this issue, and came up with the boost static assert
library as one mechanism to help improve error messages. C++11 introduces
static_assert as a language feature motivated by that experience.

Cheers,
Mark


> -Chris
>
> > cheers,
> >
> > David
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120219/4bbe59d4/attachment.html>


More information about the NumPy-Discussion mailing list