[Numpy-discussion] pre-PEP for making creative forking of NumPy less destructive

Fri May 18 00:55:53 EDT 2012

Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no> wrote:

>I'm repeating myself a bit, but my previous thread of this ended up 
>being about something else, and also since then I've been on an 
>expedition to the hostile waters of python-dev.
>
>I'm crazy enough to believe that I'm proposing a technical solution to 
>alleviate the problems we've faced as a community the past year. No, 
>this will NOT be about NA, and certainly not governance, but do please 
>allow me one paragraph of musings before the meaty stuff.
>
>I believe the Achilles heel of NumPy is the C API and the
>PyArrayObject. 
>The reliance we all have on the NumPy C API means there can in practice
>
>only be one "array" type per Python process. This makes people *very* 
>afraid of creative forking or new competing array libraries (since they
>
>just can't live in parallel -- like Cython and Pyrex can!), and every 
>new feature has to go into ndarray to fully realise itself. This in
>turn 
>means that experimentation with new features has to happen within one
>or 
>a few release cycles, it cannot happen in the wild and by competition 
>and by seeing what works over the course of years before finally making
>
>it into upstream. Finally, if any new great idea can really only be 
>implemented decently if it also impacts thousands of users...that's bad
>
>both for morale and developer recruitment.
>
>The meat:
>
>There's already of course been work on making the NumPy C API work 
>through an indirection layer to make a more stable ABI. This is about 
>changing the ideas of how that indirection should happen, so that you 
>could in theory implement the C API independently of NumPy.
>
>You could for instance make a "mini-NumPy" that only contains the bare 
>essentials, and load that in the same process as the real NumPy, and
>use 
>the C API against objects from both libraries.
>
>I'll assume that we can get a PEP through by waving a magic wand, since
>
>that makes it easier to focus on essentials. There's many ugly or less 
>ugly hacks to make it work on any existing CPython [1], and they 
>wouldn't be so ugly if there's PEP blessing for the general idea.
>
>Imagine if PyTypeObject grew an extra pointer "tp_customslots", which 
>pointed to an array of these:
>
>typedef struct {
>     unsigned long tpe_id;
>     void *tpe_data;
>} PyTypeObjectCustomSlot;
>
>The ID space is partitioned to anyone who asks, and NumPy is given a 
>large chunk. To insert a "custom slot", you stick it in this list. And 
>you search it linearly for, say, PYTYPE_CUSTOM_NUMPY_SLOT (each type 
>will typically have 0-3 entries so the search is very fast).
>
>I've benchmarked something very similar recently, and the overhead in a
>
>"hot" situation is on the order of 4-6 cycles. (As for cache, you can
>at 
>least stick the slot array right next to the type object in memory.)
>
>Now, a NumPy array would populate this list with 1-2 entries pointing
>to 
>tables of function pointers for the NumPy C API. This lookup through
>the 
>PyTypeObject would in part replace the current import_array()
>mechanism.
>
>I'd actually propose two such custom slots for ndarray for starters:
>
>a) One PEP 3118-like binary description that exposes raw data pointers 

To be more clear: the custom-slot in the pytypeobject would contain an offset that you could add to your PyObject* to get to this information.

Dag

>(without the PEP 3118 red tape)
>
>  b) A function pointer table for a suitable subset of the NumPy C API 
>(obviously not array construction and so on)
>
>The all-important PyArray_DATA/DIMS/... would be macros that try for a)
>
>first, but fall back to b). Things like PyArray_Check would actually 
>check for support of these slots, "duck typing", rather than the Python
>
>type (of course, this could only be done at a major revision like NumPy
>
>2.0 or 3.0).
>
>The overhead should be on the order of 5 cycles per C API call. That 
>should be fine for anything but the use of PyArray_DATA inside a tight 
>loop (which is a bad idea anyway).
>
>For now I just want to establish if there's support for this general 
>idea, and see if I can get some weight behind a PEP (and ideally a 
>co-author), which would make this a general approach and something more
>
>than an ugly NumPy specific hack. We'd also have good use for such a
>PEP 
>in Cython (and, I believe, Numba/SciPy in CEP 1000).
>
>Dag
>
>[1] There's many ways of doing similar things in current Python, such
>as 
>standardising across many participating projects on using a common 
>metaclass. Here's another alternative that doesn't add such 
>inter-project dependencies but is more renegade: 
>http://wiki.cython.org/enhancements/cep1001
>_______________________________________________
>NumPy-Discussion mailing list
>NumPy-Discussion at scipy.org
>http://mail.scipy.org/mailman/listinfo/numpy-discussion

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.