Re: [Numpy-discussion] Proposed Roadmap Overview

Feb. 20, 2012

      Charles R Harris wrote:
...
On Fri, Feb 17, 2012 at 12:09 PM, Benjamin Root <ben.root@ou.edu> wrote:
...
On Fri, Feb 17, 2012 at 1:00 PM, Christopher Jordan-Squire <
cjordan1@uw.edu> wrote:
...
...
On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing <efiring@hawaii.edu>
wrote:
...
On 02/17/2012 05:39 AM, Charles R Harris wrote:
...
On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau <
cournape@gmail.com
...
...
<mailto:cournape@gmail.com>> wrote:
Hi Travis,
On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant
    <travis@continuum.io <mailto:travis@continuum.io>> wrote:
     > Mark Wiebe and I have been discussing off and on (as well as
    talking with Charles) a good way forward to balance two competing
    desires:
     >
     >        * addition of new features that are needed in NumPy
     >        * improving the code-base generally and moving towards
a
    more maintainable NumPy
     >
     > I know there are load voices for just focusing on the second
of
    these and avoiding the first until we have finished that.  I
    recognize the need to improve the code base, but I will also be
    pushing for improvements to the feature-set and user experience
in
    the process.
     >
     > As a result, I am proposing a rough outline for releases over
...
...
...
next year:
     >
     >        * NumPy 1.7 to come out as soon as the serious bugs
can be
...
eliminated.  Bryan, Francesc, Mark, and I are able to help triage
    some of those.
     >
     >        * NumPy 1.8 to come out in July which will have as many
    ABI-compatible feature enhancements as we can add while improving
    test coverage and code cleanup.   I will post to this list more
    details of what we plan to address with it later.    Included for
    possible inclusion are:
     >        * resolving the NA/missing-data issues
     >        * finishing group-by
     >        * incorporating the start of label arrays
     >        * incorporating a meta-object
     >        * a few new dtypes (variable-length string,
    varialbe-length unicode and an enum type)
     >        * adding ufunc support for flexible dtypes and possibly
    structured arrays
     >        * allowing generalized ufuncs to work on more kinds of
    arrays besides just contiguous
     >        * improving the ability for NumPy to receive
JIT-generated
...
function pointers for ufuncs and other calculation opportunities
     >        * adding "filters" to Input and Output
     >        * simple computed fields for dtypes
     >        * accepting a Data-Type specification as a class or
JSON
...
file
     >        * work towards improving the dtype-addition mechanism
     >        * re-factoring of code so that it can compile with a
C++
    compiler and be minimally dependent on Python data-structures.
This is a pretty exciting list of features. What is the rationale
for
    code being compiled as C++ ? IMO, it will be difficult to do so
    without preventing useful C constructs, and without removing
some of
    the existing features (like our use of C99 complex). The subset
...
...
...
is both C and C++ compatible is quite constraining.
I'm in favor of this myself, C++ would allow a lot code cleanup and
make
...
it easier to provide an extensible base, I think it would be a
natural
fit with numpy. Of course, some C++ projects become tangled messes of
inheritance, but I'd be very interested in seeing what a good C++
designer like Mark, intimately familiar with the numpy code base,
could
do. This opportunity might not come by again anytime soon and I
...
...
...
should grab onto it. The initial step would be a release whose code
...
...
...
would compile in both C/C++, which mostly comes down to removing C++
keywords like 'new'.
I did suggest running it by you for build issues, so please raise any
you can think of. Note that MatPlotLib is in C++, so I don't think
...
...
...
problems are insurmountable. And choosing a set of compilers to
support
is something that will need to be done.
It's true that matplotlib relies heavily on C++, both via the Agg
library and in its own extension code.  Personally, I don't like this;
I
think it raises the barrier to contributing.  C++ is an order of
magnitude more complicated than C--harder to read, and much harder to
write, unless one is a true expert. In mpl it brings reliance on the
CXX
library, which Mike D. has had to help maintain.  And if it does
increase compiler specificity, that's bad.
This gets to the recruitment issue, which is one of the most important
problems I see numpy facing. I personally have contributed a lot of
code to
NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++
was
the biggest negative point when I considered whether it was worth
contributing to the project. I suspect there are many programmers out
...
who are skilled in low-level, high-performance C++, who would be
willing to
contribute, but don't want to code in C.
I believe NumPy should be trying to find people who want to make high
performance, close to the metal, libraries. This is a very different
type of
programmer than one who wants to program in Python, but is willing to
dabble
in a lower level language to make something run faster. High performance
library development is one of the things the C++ developer community
does
very well, and that community is where we have a good chance of finding
On Fri, Feb 17, 2012 at 10:21 AM, Mark Wiebe <mwwiebe@gmail.com> wrote:
the
that
think we
that
the
there
the
...
programmers NumPy needs.
...
I would much rather see development in the direction of sticking with C
where direct low-level control and speed are needed, and using cython
to
gain higher level language benefits where appropriate.  Of course, that
brings in the danger of reliance on another complex tool, cython.  If
that danger is considered excessive, then just stick with C.
There are many small benefits C++ can offer, even if numpy chooses only
to
use a tiny subset of the C++ language. For example, RAII can be used to
reliably eliminate PyObject reference leaks.
Consider a regression like this:
http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html
Fixing this in C would require switching all the relevant usages of
NPY_MAXARGS to use a dynamic memory allocation. This brings with it the
potential of easily introducing a memory leak, and is a lot of work to
do.
In C++, this functionality could be placed inside a class, where the
deterministic construction/destruction semantics eliminate the risk of
memory leaks and make the code easier to read at the same time. There
are
other examples like this where the C language has forced a suboptimal
design
choice because of how hard it would be to do it better.
Cheers,
Mark
In a similar vein, could incorporating C++ lead to a simpler low-level
API for numpy? I know Mark has talked before about--in the long-term,
as a dream project to scratch his own itch, and something the BDF12
doesn't necessarily agree with--implementing the great ideas in numpy
as a layered C++ library. (Which would have the added benefit of
making numpy more of a general array library that could be exposed to
any language which can call C++ libraries.)
I don't imagine that's on the table for anything near-term, but I
wonder if making more of the low-level stuff C++ would make it easier
for performance nuts to write their own code in C/C++ interfacing with
numpy, and then expose it to python. After playing around with ufuncs
at the C level for a little while last summer, I quickly realized any
simplifications would be greatly appreciated.
-Chris
I am also in favor of moving towards a C++ oriented library.  Personally,
I find C++ easier to read and understand, most likely because I learned it
first.  I only learned C in the context of learning C++.
Just a thought, with the upcoming revisions to the C++ standard, this does
open up the possibility of some nice templating features that would make
the library easier to use in native C++ programs.  On a side note, does
anybody use std::valarray?
My impression is that std::valarray didn't really solve the problems it was
intended to solve. IIRC, the valarray author himself said as much, but I
don't recall where.
Chuck
A related question is whether numpy core in c++ would be based on any existing 
c++ libs for HPC.  There are quite a few efforts for 1 and 2 dimensions.  Fewer 
for arbitrary (or arbitrary up to some reasonable limit) dimension.  Or, would 
we be talking about purely custom c++ code for numpy?

I suspect the latter.  Although there are many promising c++ matrix/vector type 
libraries (too many), I suspect it would be too difficult to preserve all numpy 
semantics via this route.

Re: [Numpy-discussion] Proposed Roadmap Overview

Neal Becker