Re: [Numpy-discussion] pre-PEP for making creative forking of NumPy less destructive

27 May 2012

      On 05/18/2012 01:48 PM, mark florisson wrote:
...
On 17 May 2012 23:53, Dag Sverre Seljebotn  wrote:
...
I'm repeating myself a bit, but my previous thread of this ended up
being about something else, and also since then I've been on an
expedition to the hostile waters of python-dev.
I'm crazy enough to believe that I'm proposing a technical solution to
alleviate the problems we've faced as a community the past year. No,
this will NOT be about NA, and certainly not governance, but do please
allow me one paragraph of musings before the meaty stuff.
I believe the Achilles heel of NumPy is the C API and the PyArrayObject.
The reliance we all have on the NumPy C API means there can in practice
only be one "array" type per Python process. This makes people *very*
afraid of creative forking or new competing array libraries (since they
just can't live in parallel -- like Cython and Pyrex can!), and every
new feature has to go into ndarray to fully realise itself. This in turn
means that experimentation with new features has to happen within one or
a few release cycles, it cannot happen in the wild and by competition
and by seeing what works over the course of years before finally making
it into upstream. Finally, if any new great idea can really only be
implemented decently if it also impacts thousands of users...that's bad
both for morale and developer recruitment.
The meat:
There's already of course been work on making the NumPy C API work
through an indirection layer to make a more stable ABI. This is about
changing the ideas of how that indirection should happen, so that you
could in theory implement the C API independently of NumPy.
You could for instance make a "mini-NumPy" that only contains the bare
essentials, and load that in the same process as the real NumPy, and use
the C API against objects from both libraries.
I'll assume that we can get a PEP through by waving a magic wand, since
that makes it easier to focus on essentials. There's many ugly or less
ugly hacks to make it work on any existing CPython [1], and they
wouldn't be so ugly if there's PEP blessing for the general idea.
Imagine if PyTypeObject grew an extra pointer "tp_customslots", which
pointed to an array of these:
typedef struct {
     unsigned long tpe_id;
     void *tpe_data;
} PyTypeObjectCustomSlot;
The ID space is partitioned to anyone who asks, and NumPy is given a
large chunk. To insert a "custom slot", you stick it in this list. And
you search it linearly for, say, PYTYPE_CUSTOM_NUMPY_SLOT (each type
will typically have 0-3 entries so the search is very fast).
I've benchmarked something very similar recently, and the overhead in a
"hot" situation is on the order of 4-6 cycles. (As for cache, you can at
least stick the slot array right next to the type object in memory.)
Now, a NumPy array would populate this list with 1-2 entries pointing to
tables of function pointers for the NumPy C API. This lookup through the
PyTypeObject would in part replace the current import_array() mechanism.
I'd actually propose two such custom slots for ndarray for starters:
a) One PEP 3118-like binary description that exposes raw data pointers
(without the PEP 3118 red tape)
b) A function pointer table for a suitable subset of the NumPy C API
(obviously not array construction and so on)
The all-important PyArray_DATA/DIMS/... would be macros that try for a)
first, but fall back to b). Things like PyArray_Check would actually
check for support of these slots, "duck typing", rather than the Python
type (of course, this could only be done at a major revision like NumPy
2.0 or 3.0).
The overhead should be on the order of 5 cycles per C API call. That
should be fine for anything but the use of PyArray_DATA inside a tight
loop (which is a bad idea anyway).
For now I just want to establish if there's support for this general
idea, and see if I can get some weight behind a PEP (and ideally a
co-author), which would make this a general approach and something more
than an ugly NumPy specific hack. We'd also have good use for such a PEP
in Cython (and, I believe, Numba/SciPy in CEP 1000).
Well, you have my vote, but you already knew that. I'd also be willing
to co-author any PEP etc, but I'm sensing it may be more useful to
have support from people from different projects. Personally, I think
if this is to succeed, we first need to fix the design to work for
subclasses (I think one may just want to memcpy the interface
information over to the subclass, e.g. through a convenience function
that allows one to add more as well). If we have a solid idea of the
technical implementation, we should actually implement it and present
the benchmarks, comparing the results to capsules as attributes (and
to the _PyType_Lookup approach).
Unless there's any holes in my fresh metaclass implementation, I think 
that is good enough that we can wait a year and get actual adoption 
before pushing for a PEP. That would also make the PEP a lot stronger. I 
do believe it should happen eventually though.

Here's my post on the Cython list reposted to this list:

"""
So I finally got around to implementing this:

https://github.com/dagss/pyextensibletype

Documentation now in a draft in the NumFOCUS SEP repo, which I believe 
is a better place to store cross-project standards like this. (The NumPy 
docstring standard will be SEP 100).

https://github.com/numfocus/sep/blob/master/sep200.rst

Summary:

  - No common runtime dependency

  - 1 ns overhead per lookup (that's for the custom slot *alone*, no 
fast-callable signature matching or similar)

  - Slight annoyance: Types that want to use the metaclass must be a 
PyHeapExtensibleType, to make the binary layout work with how CPython 
makes subclasses from Python scripts

My conclusion: I think the metaclass approach should work really well.
"""

Dag