[Numpy-discussion] Proposed Roadmap Overview
David Cournapeau
cournape at gmail.com
Fri Feb 17 18:44:34 EST 2012
I don't think c++ has any significant advantage over c for high performance
libraries. I am not convinced by the number of people argument either: it
is not my experience that c++ is easier to maintain in a open source
context, where the level of people is far from consistent. I doubt many
people did not contribute to numoy because it is in c instead if c++. While
this is somehow subjective, there are reasons that c is much more common
than c++ in that context.
I would much rather move most part to cython to solve subtle ref counting
issues, typically.
The only way that i know of to have a stable and usable abi is to wrap the
c++ code in c. Wrapping c++ libraries in python has always been a pain in
my experience. How are template or exceptions handled across languages ? it
will also be a significant issue on windows with open source compilers.
Interestingly, the api from clang exported to other languages is in c...
David
Le 17 févr. 2012 18:21, "Mark Wiebe" <mwwiebe at gmail.com> a écrit :
>
> On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing <efiring at hawaii.edu> wrote:
>>
>> On 02/17/2012 05:39 AM, Charles R Harris wrote:
>> >
>> >
>> > On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau <cournape at gmail.com
>> > <mailto:cournape at gmail.com>> wrote:
>> >
>> > Hi Travis,
>> >
>> > On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant
>> > <travis at continuum.io <mailto:travis at continuum.io>> wrote:
>> > > Mark Wiebe and I have been discussing off and on (as well as
>> > talking with Charles) a good way forward to balance two competing
>> > desires:
>> > >
>> > > * addition of new features that are needed in NumPy
>> > > * improving the code-base generally and moving towards a
>> > more maintainable NumPy
>> > >
>> > > I know there are load voices for just focusing on the second of
>> > these and avoiding the first until we have finished that. I
>> > recognize the need to improve the code base, but I will also be
>> > pushing for improvements to the feature-set and user experience in
>> > the process.
>> > >
>> > > As a result, I am proposing a rough outline for releases over
the
>> > next year:
>> > >
>> > > * NumPy 1.7 to come out as soon as the serious bugs can
be
>> > eliminated. Bryan, Francesc, Mark, and I are able to help triage
>> > some of those.
>> > >
>> > > * NumPy 1.8 to come out in July which will have as many
>> > ABI-compatible feature enhancements as we can add while improving
>> > test coverage and code cleanup. I will post to this list more
>> > details of what we plan to address with it later. Included for
>> > possible inclusion are:
>> > > * resolving the NA/missing-data issues
>> > > * finishing group-by
>> > > * incorporating the start of label arrays
>> > > * incorporating a meta-object
>> > > * a few new dtypes (variable-length string,
>> > varialbe-length unicode and an enum type)
>> > > * adding ufunc support for flexible dtypes and possibly
>> > structured arrays
>> > > * allowing generalized ufuncs to work on more kinds of
>> > arrays besides just contiguous
>> > > * improving the ability for NumPy to receive
JIT-generated
>> > function pointers for ufuncs and other calculation opportunities
>> > > * adding "filters" to Input and Output
>> > > * simple computed fields for dtypes
>> > > * accepting a Data-Type specification as a class or JSON
file
>> > > * work towards improving the dtype-addition mechanism
>> > > * re-factoring of code so that it can compile with a C++
>> > compiler and be minimally dependent on Python data-structures.
>> >
>> > This is a pretty exciting list of features. What is the rationale
for
>> > code being compiled as C++ ? IMO, it will be difficult to do so
>> > without preventing useful C constructs, and without removing some
of
>> > the existing features (like our use of C99 complex). The subset
that
>> > is both C and C++ compatible is quite constraining.
>> >
>> >
>> > I'm in favor of this myself, C++ would allow a lot code cleanup and
make
>> > it easier to provide an extensible base, I think it would be a natural
>> > fit with numpy. Of course, some C++ projects become tangled messes of
>> > inheritance, but I'd be very interested in seeing what a good C++
>> > designer like Mark, intimately familiar with the numpy code base, could
>> > do. This opportunity might not come by again anytime soon and I think
we
>> > should grab onto it. The initial step would be a release whose code
that
>> > would compile in both C/C++, which mostly comes down to removing C++
>> > keywords like 'new'.
>> >
>> > I did suggest running it by you for build issues, so please raise any
>> > you can think of. Note that MatPlotLib is in C++, so I don't think the
>> > problems are insurmountable. And choosing a set of compilers to support
>> > is something that will need to be done.
>>
>> It's true that matplotlib relies heavily on C++, both via the Agg
>> library and in its own extension code. Personally, I don't like this; I
>> think it raises the barrier to contributing. C++ is an order of
>> magnitude more complicated than C--harder to read, and much harder to
>> write, unless one is a true expert. In mpl it brings reliance on the CXX
>> library, which Mike D. has had to help maintain. And if it does
>> increase compiler specificity, that's bad.
>
>
> This gets to the recruitment issue, which is one of the most important
problems I see numpy facing. I personally have contributed a lot of code to
NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ was
the biggest negative point when I considered whether it was worth
contributing to the project. I suspect there are many programmers out there
who are skilled in low-level, high-performance C++, who would be willing to
contribute, but don't want to code in C.
>
> I believe NumPy should be trying to find people who want to make high
performance, close to the metal, libraries. This is a very different type
of programmer than one who wants to program in Python, but is willing to
dabble in a lower level language to make something run faster. High
performance library development is one of the things the C++ developer
community does very well, and that community is where we have a good chance
of finding the programmers NumPy needs.
>
>> I would much rather see development in the direction of sticking with C
>> where direct low-level control and speed are needed, and using cython to
>> gain higher level language benefits where appropriate. Of course, that
>> brings in the danger of reliance on another complex tool, cython. If
>> that danger is considered excessive, then just stick with C.
>
>
> There are many small benefits C++ can offer, even if numpy chooses only
to use a tiny subset of the C++ language. For example, RAII can be used to
reliably eliminate PyObject reference leaks.
>
> Consider a regression like this:
> http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html
>
> Fixing this in C would require switching all the relevant usages of
NPY_MAXARGS to use a dynamic memory allocation. This brings with it the
potential of easily introducing a memory leak, and is a lot of work to do.
In C++, this functionality could be placed inside a class, where the
deterministic construction/destruction semantics eliminate the risk of
memory leaks and make the code easier to read at the same time. There are
other examples like this where the C language has forced a suboptimal
design choice because of how hard it would be to do it better.
>
> Cheers,
> Mark
>
>>
>> Eric
>>
>> >
>> > Chuck
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120217/8d6fd6a2/attachment.html>
More information about the NumPy-Discussion
mailing list