[Numpy-discussion] pre-PEP for making creative forking of NumPy less destructive

Sun May 27 17:27:19 EDT 2012

On 05/18/2012 01:48 PM, mark florisson wrote:
> On 17 May 2012 23:53, Dag Sverre Seljebotn<d.s.seljebotn at astro.uio.no>  wrote:
>> I'm repeating myself a bit, but my previous thread of this ended up
>> being about something else, and also since then I've been on an
>> expedition to the hostile waters of python-dev.
>>
>> I'm crazy enough to believe that I'm proposing a technical solution to
>> alleviate the problems we've faced as a community the past year. No,
>> this will NOT be about NA, and certainly not governance, but do please
>> allow me one paragraph of musings before the meaty stuff.
>>
>> I believe the Achilles heel of NumPy is the C API and the PyArrayObject.
>> The reliance we all have on the NumPy C API means there can in practice
>> only be one "array" type per Python process. This makes people *very*
>> afraid of creative forking or new competing array libraries (since they
>> just can't live in parallel -- like Cython and Pyrex can!), and every
>> new feature has to go into ndarray to fully realise itself. This in turn
>> means that experimentation with new features has to happen within one or
>> a few release cycles, it cannot happen in the wild and by competition
>> and by seeing what works over the course of years before finally making
>> it into upstream. Finally, if any new great idea can really only be
>> implemented decently if it also impacts thousands of users...that's bad
>> both for morale and developer recruitment.
>>
>> The meat:
>>
>> There's already of course been work on making the NumPy C API work
>> through an indirection layer to make a more stable ABI. This is about
>> changing the ideas of how that indirection should happen, so that you
>> could in theory implement the C API independently of NumPy.
>>
>> You could for instance make a "mini-NumPy" that only contains the bare
>> essentials, and load that in the same process as the real NumPy, and use
>> the C API against objects from both libraries.
>>
>> I'll assume that we can get a PEP through by waving a magic wand, since
>> that makes it easier to focus on essentials. There's many ugly or less
>> ugly hacks to make it work on any existing CPython [1], and they
>> wouldn't be so ugly if there's PEP blessing for the general idea.
>>
>> Imagine if PyTypeObject grew an extra pointer "tp_customslots", which
>> pointed to an array of these:
>>
>> typedef struct {
>>      unsigned long tpe_id;
>>      void *tpe_data;
>> } PyTypeObjectCustomSlot;
>>
>> The ID space is partitioned to anyone who asks, and NumPy is given a
>> large chunk. To insert a "custom slot", you stick it in this list. And
>> you search it linearly for, say, PYTYPE_CUSTOM_NUMPY_SLOT (each type
>> will typically have 0-3 entries so the search is very fast).
>>
>> I've benchmarked something very similar recently, and the overhead in a
>> "hot" situation is on the order of 4-6 cycles. (As for cache, you can at
>> least stick the slot array right next to the type object in memory.)
>>
>> Now, a NumPy array would populate this list with 1-2 entries pointing to
>> tables of function pointers for the NumPy C API. This lookup through the
>> PyTypeObject would in part replace the current import_array() mechanism.
>>
>> I'd actually propose two such custom slots for ndarray for starters:
>>
>>   a) One PEP 3118-like binary description that exposes raw data pointers
>> (without the PEP 3118 red tape)
>>
>>   b) A function pointer table for a suitable subset of the NumPy C API
>> (obviously not array construction and so on)
>>
>> The all-important PyArray_DATA/DIMS/... would be macros that try for a)
>> first, but fall back to b). Things like PyArray_Check would actually
>> check for support of these slots, "duck typing", rather than the Python
>> type (of course, this could only be done at a major revision like NumPy
>> 2.0 or 3.0).
>>
>> The overhead should be on the order of 5 cycles per C API call. That
>> should be fine for anything but the use of PyArray_DATA inside a tight
>> loop (which is a bad idea anyway).
>>
>> For now I just want to establish if there's support for this general
>> idea, and see if I can get some weight behind a PEP (and ideally a
>> co-author), which would make this a general approach and something more
>> than an ugly NumPy specific hack. We'd also have good use for such a PEP
>> in Cython (and, I believe, Numba/SciPy in CEP 1000).
>
> Well, you have my vote, but you already knew that. I'd also be willing
> to co-author any PEP etc, but I'm sensing it may be more useful to
> have support from people from different projects. Personally, I think
> if this is to succeed, we first need to fix the design to work for
> subclasses (I think one may just want to memcpy the interface
> information over to the subclass, e.g. through a convenience function
> that allows one to add more as well). If we have a solid idea of the
> technical implementation, we should actually implement it and present
> the benchmarks, comparing the results to capsules as attributes (and
> to the _PyType_Lookup approach).

Unless there's any holes in my fresh metaclass implementation, I think 
that is good enough that we can wait a year and get actual adoption 
before pushing for a PEP. That would also make the PEP a lot stronger. I 
do believe it should happen eventually though.

Here's my post on the Cython list reposted to this list:

"""
So I finally got around to implementing this:

https://github.com/dagss/pyextensibletype

Documentation now in a draft in the NumFOCUS SEP repo, which I believe 
is a better place to store cross-project standards like this. (The NumPy 
docstring standard will be SEP 100).

https://github.com/numfocus/sep/blob/master/sep200.rst

Summary:

  - No common runtime dependency

  - 1 ns overhead per lookup (that's for the custom slot *alone*, no 
fast-callable signature matching or similar)

  - Slight annoyance: Types that want to use the metaclass must be a 
PyHeapExtensibleType, to make the binary layout work with how CPython 
makes subclasses from Python scripts

My conclusion: I think the metaclass approach should work really well.
"""

Dag