[Numpy-discussion] Proposed Roadmap Overview
Benjamin Root
ben.root at ou.edu
Fri Feb 17 14:09:36 EST 2012
On Fri, Feb 17, 2012 at 1:00 PM, Christopher Jordan-Squire
<cjordan1 at uw.edu>wrote:
> On Fri, Feb 17, 2012 at 10:21 AM, Mark Wiebe <mwwiebe at gmail.com> wrote:
> > On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing <efiring at hawaii.edu>
> wrote:
> >>
> >> On 02/17/2012 05:39 AM, Charles R Harris wrote:
> >> >
> >> >
> >> > On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau <cournape at gmail.com
> >> > <mailto:cournape at gmail.com>> wrote:
> >> >
> >> > Hi Travis,
> >> >
> >> > On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant
> >> > <travis at continuum.io <mailto:travis at continuum.io>> wrote:
> >> > > Mark Wiebe and I have been discussing off and on (as well as
> >> > talking with Charles) a good way forward to balance two competing
> >> > desires:
> >> > >
> >> > > * addition of new features that are needed in NumPy
> >> > > * improving the code-base generally and moving towards a
> >> > more maintainable NumPy
> >> > >
> >> > > I know there are load voices for just focusing on the second of
> >> > these and avoiding the first until we have finished that. I
> >> > recognize the need to improve the code base, but I will also be
> >> > pushing for improvements to the feature-set and user experience in
> >> > the process.
> >> > >
> >> > > As a result, I am proposing a rough outline for releases over
> the
> >> > next year:
> >> > >
> >> > > * NumPy 1.7 to come out as soon as the serious bugs can
> be
> >> > eliminated. Bryan, Francesc, Mark, and I are able to help triage
> >> > some of those.
> >> > >
> >> > > * NumPy 1.8 to come out in July which will have as many
> >> > ABI-compatible feature enhancements as we can add while improving
> >> > test coverage and code cleanup. I will post to this list more
> >> > details of what we plan to address with it later. Included for
> >> > possible inclusion are:
> >> > > * resolving the NA/missing-data issues
> >> > > * finishing group-by
> >> > > * incorporating the start of label arrays
> >> > > * incorporating a meta-object
> >> > > * a few new dtypes (variable-length string,
> >> > varialbe-length unicode and an enum type)
> >> > > * adding ufunc support for flexible dtypes and possibly
> >> > structured arrays
> >> > > * allowing generalized ufuncs to work on more kinds of
> >> > arrays besides just contiguous
> >> > > * improving the ability for NumPy to receive
> JIT-generated
> >> > function pointers for ufuncs and other calculation opportunities
> >> > > * adding "filters" to Input and Output
> >> > > * simple computed fields for dtypes
> >> > > * accepting a Data-Type specification as a class or JSON
> >> > file
> >> > > * work towards improving the dtype-addition mechanism
> >> > > * re-factoring of code so that it can compile with a C++
> >> > compiler and be minimally dependent on Python data-structures.
> >> >
> >> > This is a pretty exciting list of features. What is the rationale
> >> > for
> >> > code being compiled as C++ ? IMO, it will be difficult to do so
> >> > without preventing useful C constructs, and without removing some
> of
> >> > the existing features (like our use of C99 complex). The subset
> that
> >> > is both C and C++ compatible is quite constraining.
> >> >
> >> >
> >> > I'm in favor of this myself, C++ would allow a lot code cleanup and
> make
> >> > it easier to provide an extensible base, I think it would be a natural
> >> > fit with numpy. Of course, some C++ projects become tangled messes of
> >> > inheritance, but I'd be very interested in seeing what a good C++
> >> > designer like Mark, intimately familiar with the numpy code base,
> could
> >> > do. This opportunity might not come by again anytime soon and I think
> we
> >> > should grab onto it. The initial step would be a release whose code
> that
> >> > would compile in both C/C++, which mostly comes down to removing C++
> >> > keywords like 'new'.
> >> >
> >> > I did suggest running it by you for build issues, so please raise any
> >> > you can think of. Note that MatPlotLib is in C++, so I don't think the
> >> > problems are insurmountable. And choosing a set of compilers to
> support
> >> > is something that will need to be done.
> >>
> >> It's true that matplotlib relies heavily on C++, both via the Agg
> >> library and in its own extension code. Personally, I don't like this; I
> >> think it raises the barrier to contributing. C++ is an order of
> >> magnitude more complicated than C--harder to read, and much harder to
> >> write, unless one is a true expert. In mpl it brings reliance on the CXX
> >> library, which Mike D. has had to help maintain. And if it does
> >> increase compiler specificity, that's bad.
> >
> >
> > This gets to the recruitment issue, which is one of the most important
> > problems I see numpy facing. I personally have contributed a lot of code
> to
> > NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++
> was
> > the biggest negative point when I considered whether it was worth
> > contributing to the project. I suspect there are many programmers out
> there
> > who are skilled in low-level, high-performance C++, who would be willing
> to
> > contribute, but don't want to code in C.
> >
> > I believe NumPy should be trying to find people who want to make high
> > performance, close to the metal, libraries. This is a very different
> type of
> > programmer than one who wants to program in Python, but is willing to
> dabble
> > in a lower level language to make something run faster. High performance
> > library development is one of the things the C++ developer community does
> > very well, and that community is where we have a good chance of finding
> the
> > programmers NumPy needs.
> >
> >> I would much rather see development in the direction of sticking with C
> >> where direct low-level control and speed are needed, and using cython to
> >> gain higher level language benefits where appropriate. Of course, that
> >> brings in the danger of reliance on another complex tool, cython. If
> >> that danger is considered excessive, then just stick with C.
> >
> >
> > There are many small benefits C++ can offer, even if numpy chooses only
> to
> > use a tiny subset of the C++ language. For example, RAII can be used to
> > reliably eliminate PyObject reference leaks.
> >
> > Consider a regression like this:
> > http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html
> >
> > Fixing this in C would require switching all the relevant usages of
> > NPY_MAXARGS to use a dynamic memory allocation. This brings with it the
> > potential of easily introducing a memory leak, and is a lot of work to
> do.
> > In C++, this functionality could be placed inside a class, where the
> > deterministic construction/destruction semantics eliminate the risk of
> > memory leaks and make the code easier to read at the same time. There are
> > other examples like this where the C language has forced a suboptimal
> design
> > choice because of how hard it would be to do it better.
> >
> > Cheers,
> > Mark
> >
>
> In a similar vein, could incorporating C++ lead to a simpler low-level
> API for numpy? I know Mark has talked before about--in the long-term,
> as a dream project to scratch his own itch, and something the BDF12
> doesn't necessarily agree with--implementing the great ideas in numpy
> as a layered C++ library. (Which would have the added benefit of
> making numpy more of a general array library that could be exposed to
> any language which can call C++ libraries.)
>
> I don't imagine that's on the table for anything near-term, but I
> wonder if making more of the low-level stuff C++ would make it easier
> for performance nuts to write their own code in C/C++ interfacing with
> numpy, and then expose it to python. After playing around with ufuncs
> at the C level for a little while last summer, I quickly realized any
> simplifications would be greatly appreciated.
>
> -Chris
>
>
>
I am also in favor of moving towards a C++ oriented library. Personally, I
find C++ easier to read and understand, most likely because I learned it
first. I only learned C in the context of learning C++.
Just a thought, with the upcoming revisions to the C++ standard, this does
open up the possibility of some nice templating features that would make
the library easier to use in native C++ programs. On a side note, does
anybody use std::valarray?
Cheers!
Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120217/5041c84f/attachment.html>
More information about the NumPy-Discussion
mailing list