[Numpy-discussion] big-bangs versus incremental improvements (was: Re: SciPy 2014 BoF NumPy Participation)

David Cournapeau cournape at gmail.com
Fri Jun 6 04:48:53 EDT 2014


On Thu, Jun 5, 2014 at 11:48 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Thu, Jun 5, 2014 at 3:24 PM, David Cournapeau <cournape at gmail.com>
> wrote:
> > On Thu, Jun 5, 2014 at 2:51 PM, Charles R Harris <
> charlesr.harris at gmail.com>
> > wrote:
> >> On Thu, Jun 5, 2014 at 6:40 AM, David Cournapeau <cournape at gmail.com>
> >> wrote:
> >>> IMO, what is needed the most is refactoring the internal to extract the
> >>> Python C API low level from the rest of the code, as I think that's
> the main
> >>> bottleneck to get more contributors (or get new core features more
> quickly).
> >>>
> >>
> >> What do you mean by "extract the Python C API"?
> >
> > Poor choice of words: I meant extracting the lower level part of
> > array/ufunc/etc... from its wrapping into the python C API (with the idea
> > that the latter could be done in Cython, modulo improvements in cython to
> > manage the binary/code size explosion).
> >
> > IOW, split numpy into core and core-py (I think dynd benefits a lots from
> > that, on top of its feature set).
>
> Can you give some examples of these benefits?


numpy.core is difficult to approach as a codebase: it is big, and quite
entangled. Why the concepts are sound, there is not much internal
architecture. I would love for numpy to have a proper internal C API. I'd
like to think my effort of splitting multiarray giant .c files into
multiple files somehow made everybody's like easier.

A lot of the current code is python C API, which nobody cares about, and
could be handled by e.g. cython (although again the feasibility of that
needs to be discussed with the cython team, as cython cannot
realisticallybe  used pervasively for numpy ATM, see e.g.
http://mail.scipy.org/pipermail/scipy-dev/2012-July/017717.html).

Such a separation would also be helpful for the pie-in-the-sky projects you
mentioned. I think Wes Mc Kinney and the pandas team experience about using
cython for the core vs just for the C<->Python integration would be useful
there as well.

I'm kinda wary of
> refactoring-for-the-sake-of-it -- IME usually it's easier, more
> valuable, and more fun to refactor in the process of making some
> concrete improvement.
>

Sure, I am just suggesting there should be a conscious effort to not just
add features but also think about the internal consistency.

One concrete example is dtype and its pluggability: I would love to see
things like datetime in a separate extension. It would keep us honest to
allow people creating custom dtype (last time I've checked, there were some
hardcoding that made it hard to do).

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140606/086083a3/attachment.html>


More information about the NumPy-Discussion mailing list