[Numpy-discussion] Proposed Roadmap Overview

Sat Feb 18 19:12:56 EST 2012

On Sat, Feb 18, 2012 at 10:54 PM, Travis Oliphant <travis at continuum.io> wrote:
> I'm reading very carefully any arguments against using C++ because I've actually pushed back on Mark pretty hard as we've discussed these things over the past months.  I am nervous about corner use-cases that will be unpleasant for some groups and some platforms.    But, that vague nervousness is not enough to discount the clear benefits.   I'm curious about the state of C++ compilers for Blue-Gene and other big-iron machines as well.   My impression is that most of them use g++.   which has pretty good support for C++.    David and others raised some important concerns (merging multiple compilers seems like the biggest issue --- it already is...).    If someone out there seriously opposes judicious and careful use of C++ and can show a clear reason why it would be harmful --- feel free to speak up at any time.   We are leaning that way with Mark out in front of us leading the charge.

I don't oppose it, but I admit I'm not really clear on what the
supposed advantages would be. Everyone seems to agree that
  -- Only a carefully-chosen subset of C++ features should be used
  -- But this subset would be pretty useful
I wonder if anyone is actually thinking of the same subset :-).

Chuck mentioned iterators as one advantage. I don't understand, since
iterators aren't even a C++ feature, they're just objects with "next"
and "dereference" operators. The only difference between these is
spelling:
  for (my_iter i = foo.begin(); i != foo.end(); ++i) { ... }
  for (my_iter i = my_iter_begin(foo); !my_iter_ended(&i);
my_iter_next(&i)) { ... }
So I assume he's thinking about something more, but the discussion has
been too high-level for me to figure out what.

Using C++ templates to generate ufunc loops is an obvious application,
but again, in the simple examples I'm thinking of (e.g., the stuff in
numpy/core/src/umath/loops.c.src), this pretty much comes down to
whether we want to spell the function names like "SHORT_add" or
"add<short>", and write the code like "*(T *))x[0] + ((T *)y)[0]" or
"((@TYPE@ *)x)[0] + ((@TYPE@ *)y)[0]". Maybe there are other places
where we'd get some advantage from the compiler knowing what was going
on, like if we're doing type-based dispatch to overloaded functions,
but I don't know if that'd be useful for the templates we actually
use.

RAII is pretty awesome, and RAII smart-pointers might help a lot with
getting reference-counting right. OTOH, you really only need RAII if
you're using exceptions; otherwise, the goto-failure pattern usually
works pretty well, esp. if used systematically.

Do we know that the Python memory allocator plays well with the C++
allocation interfaces on all relevant systems? (Potentially you have
to know for every pointer whether it was allocated by new, new[],
malloc, or PyMem_Malloc, because they all have different deallocation
functions. This is already an issue for malloc versus PyMem_Malloc,
but C++ makes it worse.)

Again, it really doesn't matter to me personally which approach is
chosen. But getting more concrete might be useful...

-- Nathaniel