[Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch)

David Cournapeau cournape at gmail.com
Tue Jun 26 10:15:25 EDT 2012

On Tue, Jun 26, 2012 at 2:40 PM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 06/26/2012 01:48 PM, David Cournapeau wrote:
>> Hi,
>> I am just continuing the discussion around ABI/API, the technical side
>> of things that is, as this is unrelated to 1.7.x. release.
>> On Tue, Jun 26, 2012 at 11:41 AM, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no>  wrote:
>>> On 06/26/2012 11:58 AM, David Cournapeau wrote:
>>>> On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn
>>>> <d.s.seljebotn at astro.uio.no>    wrote:
>>>>> On 06/26/2012 05:35 AM, David Cournapeau wrote:
>>>>>> On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertík<ondrej.certik at gmail.com>      wrote:
>>>>>>> My understanding is that Travis is simply trying to stress "We have to
>>>>>>> think about the implications of our changes on existing users." and
>>>>>>> also that little changes (with the best intentions!) that however mean
>>>>>>> either a breakage or confusion for users (due to historical reasons)
>>>>>>> should be avoided if possible. And I very strongly feel the same way.
>>>>>>> And I think that most people on this list do as well.
>>>>>> I think Travis is more concerned about API than ABI changes (in that
>>>>>> example for 1.4, the ABI breakage was caused by a change that was
>>>>>> pushed by Travis IIRC).
>>>>>> The relative importance of API vs ABI is a tough one: I think ABI
>>>>>> breakage is as bad as API breakage (but matter in different
>>>>>> circumstances), but it is hard to improve the situation around our ABI
>>>>>> without changing the API (especially everything around macros and
>>>>>> publicly accessible structures). Changing this is politically
>>>>> But I think it is *possible* to get to a situation where ABI isn't
>>>>> broken without changing API. I have posted such a proposal.
>>>>> If one uses the kind of C-level duck typing I describe in the link
>>>>> below, one would do
>>>>> typedef PyObject PyArrayObject;
>>>>> typedef struct {
>>>>>      ...
>>>>> } NumPyArray; /* used to be PyArrayObject */
>>>> Maybe we're just in violent agreement, but whatever ends up being used
>>>> would require to change the *current* C API, right ? If one wants to
>>> Accessing arr->dims[i] directly would need to change. But that's been
>>> discouraged for a long time. By "API" I meant access through the macros.
>>> One of the changes under discussion here is to change PyArray_SHAPE from
>>> a macro that accepts both PyObject* and PyArrayObject* to a function
>>> that only accepts PyArrayObject* (hence breakage). I'm saying that under
>>> my proposal, assuming I or somebody else can find the time to implement
>>> it under, you can both make it a function and have it accept both
>>> PyObject* and PyArrayObject* (since they are the same), undoing the
>>> breakage but allowing to hide the ABI.
>>> (It doesn't give you full flexibility in ABI, it does require that you
>>> somewhere have an "npy_intp dims[nd]" with the same lifetime as your
>>> object, etc., but I don't consider that a big disadvantage).
>>>> allow for changes in our structures more freely, we have to hide them
>>>> from the headers, which means breaking the code that depends on the
>>>> structure binary layout. Any code that access those directly will need
>>>> to be changed.
>>>> There is the particular issue of iterator, which seem quite difficult
>>>> to make "ABI-safe" without losing significant performance.
>>> I don't agree (for some meanings of "ABI-safe"). You can export the data
>>> (dataptr/shape/strides) through the ABI, then the iterator uses these in
>>> whatever way it wishes consumer-side. Sort of like PEP 3118 without the
>>> performance degradation. The only sane way IMO of doing iteration is
>>> building it into the consumer anyway.
>> (I have not read the whole cython discussion yet)
> I'll try to write a summary and post it when I can get around to it.
>> What do you mean by "building iteration in the consumer" ? My
> "consumer" is the user of the NumPy C API. So I meant that the iteration
> logic is all in C header files and compiled again for each such
> consumer. Iterators don't cross the ABI boundary.
>> understanding is that any data export would be done through a level of
>> indirection (dataptr/shape/strides). Conceptually, I can't see how one
>> could keep ABI without that level of indirection without some compile.
>> In the case of iterator, that means multiple pointer chasing per
>> sample -- i.e. the tight loop issue you mentioned earlier for
>> PyArray_DATA is the common case for iterator.
> Even if you do indirection, iterator utilities that are compiled in the
> "consumer"/user code can cache the data that's retrieved.
> Iterators just do
> // setup crossing ABI
> npy_intp *shape = PyArray_DIMS(arr);
> npy_intp *strides = PyArray_STRIDES(arr);
> ...
> // performance-sensitive code just accesses cached pointers and don't
> // cross ABI

The problem is that iterators need more that this. But thinking more
about it, I am not so dead sure we could not get there. I will need to
play with some code.

> Going slightly OT, then IMO, the *only* long-term solution in 2012 is
> LLVM. That allows you to do any level of inlining and special casing and
> optimization at run-time, which is the only way of matching needs for
> performance with using Python at all.
> Mark Florisson is heading down that road this summer with his 'minivect'
> project (essentially, code generation for optimal iteration over NumPy
> (or NumPy-like) arrays that can be used both by Cython (C code
> generation backend) and Numba (LLVM code generation backend)).
> Relying on C++ metaprogramming to implement iterators is like using the
> technology of the 80's to build the NumPy of the 2010's. It can only be
> exported to Python in a crippled form, so kind of useless. (C++ to
> implement the core that sits behind an ABI is another matter, I don't
> have an opinion on that. But iterators can't be behind the ABI, as I
> think we agree on.)

Well, no need to convince me about which of the two solutions is the
most appropriate. I was just trying to appear more unbiased than I
really am :)


More information about the NumPy-Discussion mailing list