[Numpy-discussion] Proposed Roadmap Overview

Christopher Hanley chanley at gmail.com
Fri Feb 17 15:54:08 EST 2012


On Fri, Feb 17, 2012 at 3:38 PM, Ralf Gommers
<ralf.gommers at googlemail.com>wrote:

>
>
> On Fri, Feb 17, 2012 at 8:31 PM, Mark Wiebe <mwwiebe at gmail.com> wrote:
>
>> On Fri, Feb 17, 2012 at 11:00 AM, Christopher Jordan-Squire <
>> cjordan1 at uw.edu> wrote:
>>
>>> On Fri, Feb 17, 2012 at 10:21 AM, Mark Wiebe <mwwiebe at gmail.com> wrote:
>>> > On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing <efiring at hawaii.edu>
>>> wrote:
>>> >>
>>> >> On 02/17/2012 05:39 AM, Charles R Harris wrote:
>>> >> >
>>> >> >
>>> >> > On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau <
>>> cournape at gmail.com
>>> >> > <mailto:cournape at gmail.com>> wrote:
>>> >> >
>>> >> >     Hi Travis,
>>> >> >
>>> >> >     On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant
>>> >> >     <travis at continuum.io <mailto:travis at continuum.io>> wrote:
>>> >> >      > Mark Wiebe and I have been discussing off and on (as well as
>>> >> >     talking with Charles) a good way forward to balance two
>>> competing
>>> >> >     desires:
>>> >> >      >
>>> >> >      >        * addition of new features that are needed in NumPy
>>> >> >      >        * improving the code-base generally and moving
>>> towards a
>>> >> >     more maintainable NumPy
>>> >> >      >
>>> >> >      > I know there are load voices for just focusing on the second
>>> of
>>> >> >     these and avoiding the first until we have finished that.  I
>>> >> >     recognize the need to improve the code base, but I will also be
>>> >> >     pushing for improvements to the feature-set and user experience
>>> in
>>> >> >     the process.
>>> >> >      >
>>> >> >      > As a result, I am proposing a rough outline for releases
>>> over the
>>> >> >     next year:
>>> >> >      >
>>> >> >      >        * NumPy 1.7 to come out as soon as the serious bugs
>>> can be
>>> >> >     eliminated.  Bryan, Francesc, Mark, and I are able to help
>>> triage
>>> >> >     some of those.
>>> >> >      >
>>> >> >      >        * NumPy 1.8 to come out in July which will have as
>>> many
>>> >> >     ABI-compatible feature enhancements as we can add while
>>> improving
>>> >> >     test coverage and code cleanup.   I will post to this list more
>>> >> >     details of what we plan to address with it later.    Included
>>> for
>>> >> >     possible inclusion are:
>>> >> >      >        * resolving the NA/missing-data issues
>>> >> >      >        * finishing group-by
>>> >> >      >        * incorporating the start of label arrays
>>> >> >      >        * incorporating a meta-object
>>> >> >      >        * a few new dtypes (variable-length string,
>>> >> >     varialbe-length unicode and an enum type)
>>> >> >      >        * adding ufunc support for flexible dtypes and
>>> possibly
>>> >> >     structured arrays
>>> >> >      >        * allowing generalized ufuncs to work on more kinds of
>>> >> >     arrays besides just contiguous
>>> >> >      >        * improving the ability for NumPy to receive
>>> JIT-generated
>>> >> >     function pointers for ufuncs and other calculation opportunities
>>> >> >      >        * adding "filters" to Input and Output
>>> >> >      >        * simple computed fields for dtypes
>>> >> >      >        * accepting a Data-Type specification as a class or
>>> JSON
>>> >> > file
>>> >> >      >        * work towards improving the dtype-addition mechanism
>>> >> >      >        * re-factoring of code so that it can compile with a
>>> C++
>>> >> >     compiler and be minimally dependent on Python data-structures.
>>> >> >
>>> >> >     This is a pretty exciting list of features. What is the
>>> rationale
>>> >> > for
>>> >> >     code being compiled as C++ ? IMO, it will be difficult to do so
>>> >> >     without preventing useful C constructs, and without removing
>>> some of
>>> >> >     the existing features (like our use of C99 complex). The subset
>>> that
>>> >> >     is both C and C++ compatible is quite constraining.
>>> >> >
>>> >> >
>>> >> > I'm in favor of this myself, C++ would allow a lot code cleanup and
>>> make
>>> >> > it easier to provide an extensible base, I think it would be a
>>> natural
>>> >> > fit with numpy. Of course, some C++ projects become tangled messes
>>> of
>>> >> > inheritance, but I'd be very interested in seeing what a good C++
>>> >> > designer like Mark, intimately familiar with the numpy code base,
>>> could
>>> >> > do. This opportunity might not come by again anytime soon and I
>>> think we
>>> >> > should grab onto it. The initial step would be a release whose code
>>> that
>>> >> > would compile in both C/C++, which mostly comes down to removing C++
>>> >> > keywords like 'new'.
>>> >> >
>>> >> > I did suggest running it by you for build issues, so please raise
>>> any
>>> >> > you can think of. Note that MatPlotLib is in C++, so I don't think
>>> the
>>> >> > problems are insurmountable. And choosing a set of compilers to
>>> support
>>> >> > is something that will need to be done.
>>> >>
>>> >> It's true that matplotlib relies heavily on C++, both via the Agg
>>> >> library and in its own extension code.  Personally, I don't like
>>> this; I
>>> >> think it raises the barrier to contributing.  C++ is an order of
>>> >> magnitude more complicated than C--harder to read, and much harder to
>>> >> write, unless one is a true expert. In mpl it brings reliance on the
>>> CXX
>>> >> library, which Mike D. has had to help maintain.  And if it does
>>> >> increase compiler specificity, that's bad.
>>> >
>>> >
>>> > This gets to the recruitment issue, which is one of the most important
>>> > problems I see numpy facing. I personally have contributed a lot of
>>> code to
>>> > NumPy *in spite of* the fact it's in C. NumPy being in C instead of
>>> C++ was
>>> > the biggest negative point when I considered whether it was worth
>>> > contributing to the project. I suspect there are many programmers out
>>> there
>>> > who are skilled in low-level, high-performance C++, who would be
>>> willing to
>>> > contribute, but don't want to code in C.
>>> >
>>> > I believe NumPy should be trying to find people who want to make high
>>> > performance, close to the metal, libraries. This is a very different
>>> type of
>>> > programmer than one who wants to program in Python, but is willing to
>>> dabble
>>> > in a lower level language to make something run faster. High
>>> performance
>>> > library development is one of the things the C++ developer community
>>> does
>>> > very well, and that community is where we have a good chance of
>>> finding the
>>> > programmers NumPy needs.
>>> >
>>> >> I would much rather see development in the direction of sticking with
>>> C
>>> >> where direct low-level control and speed are needed, and using cython
>>> to
>>> >> gain higher level language benefits where appropriate.  Of course,
>>> that
>>> >> brings in the danger of reliance on another complex tool, cython.  If
>>> >> that danger is considered excessive, then just stick with C.
>>> >
>>> >
>>> > There are many small benefits C++ can offer, even if numpy chooses
>>> only to
>>> > use a tiny subset of the C++ language. For example, RAII can be used to
>>> > reliably eliminate PyObject reference leaks.
>>> >
>>> > Consider a regression like this:
>>> > http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html
>>> >
>>> > Fixing this in C would require switching all the relevant usages of
>>> > NPY_MAXARGS to use a dynamic memory allocation. This brings with it the
>>> > potential of easily introducing a memory leak, and is a lot of work to
>>> do.
>>> > In C++, this functionality could be placed inside a class, where the
>>> > deterministic construction/destruction semantics eliminate the risk of
>>> > memory leaks and make the code easier to read at the same time. There
>>> are
>>> > other examples like this where the C language has forced a suboptimal
>>> design
>>> > choice because of how hard it would be to do it better.
>>> >
>>> > Cheers,
>>> > Mark
>>> >
>>>
>>> In a similar vein, could incorporating C++ lead to a simpler low-level
>>> API for numpy?
>>
>>
>> This could definitely happen. One way to do it is to have a stable C API
>> which remains fixed over many releases, and a C++ library which is allowed
>> to change significantly at each release. This is what the LLVM project
>> does, for example. OpenCV is an example of another project which was
>> previously just C, but now has an extensive C++ API.
>>
>>
>>> I know Mark has talked before about--in the long-term,
>>> as a dream project to scratch his own itch, and something the BDF12
>>> doesn't necessarily agree with--implementing the great ideas in numpy
>>> as a layered C++ library. (Which would have the added benefit of
>>> making numpy more of a general array library that could be exposed to
>>> any language which can call C++ libraries.)
>>>
>>> I don't imagine that's on the table for anything near-term, but I
>>> wonder if making more of the low-level stuff C++ would make it easier
>>> for performance nuts to write their own code in C/C++ interfacing with
>>> numpy, and then expose it to python. After playing around with ufuncs
>>> at the C level for a little while last summer, I quickly realized any
>>> simplifications would be greatly appreciated.
>>>
>>
>> This is all possible, yes. The way this typically works is that library
>> authors use advanced C++ techniques to get generality, performance, and
>> usability. The library user can then write code which is very simple and
>> written in a way which makes simple errors very difficult to make compared
>> to using a C-like API.
>>
>
> While the longer compile times are going to annoy me, I don't have a
> strong opinion on using C++. One thing to keep in mind though is
> portability. Numpy is used on many platforms and with many compilers.
> Keeping things working on AIX or with a PathScale compiler for example will
> be a lot more difficult when using C++. Or will support for not-so-common
> platforms be reduced?
>
> Ralf
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>

Ralf makes a good point.  During the early numpy development days I was
eternally fighting with Solaris compilers.  It's not really a big issue for
us anymore since we have dropped Solaris support.  But I'm '+1' for having
easy numpy distribution being something to consider.

Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120217/39231fcc/attachment.html>


More information about the NumPy-Discussion mailing list