[Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch)

Tue Jun 26 07:48:30 EDT 2012

Hi,

I am just continuing the discussion around ABI/API, the technical side
of things that is, as this is unrelated to 1.7.x. release.

On Tue, Jun 26, 2012 at 11:41 AM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 06/26/2012 11:58 AM, David Cournapeau wrote:
>> On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no>  wrote:
>>> On 06/26/2012 05:35 AM, David Cournapeau wrote:
>>>> On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertík<ondrej.certik at gmail.com>    wrote:
>>>>
>>>>>
>>>>> My understanding is that Travis is simply trying to stress "We have to
>>>>> think about the implications of our changes on existing users." and
>>>>> also that little changes (with the best intentions!) that however mean
>>>>> either a breakage or confusion for users (due to historical reasons)
>>>>> should be avoided if possible. And I very strongly feel the same way.
>>>>> And I think that most people on this list do as well.
>>>>
>>>> I think Travis is more concerned about API than ABI changes (in that
>>>> example for 1.4, the ABI breakage was caused by a change that was
>>>> pushed by Travis IIRC).
>>>>
>>>> The relative importance of API vs ABI is a tough one: I think ABI
>>>> breakage is as bad as API breakage (but matter in different
>>>> circumstances), but it is hard to improve the situation around our ABI
>>>> without changing the API (especially everything around macros and
>>>> publicly accessible structures). Changing this is politically
>>>
>>> But I think it is *possible* to get to a situation where ABI isn't
>>> broken without changing API. I have posted such a proposal.
>>> If one uses the kind of C-level duck typing I describe in the link
>>> below, one would do
>>>
>>> typedef PyObject PyArrayObject;
>>>
>>> typedef struct {
>>>     ...
>>> } NumPyArray; /* used to be PyArrayObject */
>>
>> Maybe we're just in violent agreement, but whatever ends up being used
>> would require to change the *current* C API, right ? If one wants to
>
> Accessing arr->dims[i] directly would need to change. But that's been
> discouraged for a long time. By "API" I meant access through the macros.
>
> One of the changes under discussion here is to change PyArray_SHAPE from
> a macro that accepts both PyObject* and PyArrayObject* to a function
> that only accepts PyArrayObject* (hence breakage). I'm saying that under
> my proposal, assuming I or somebody else can find the time to implement
> it under, you can both make it a function and have it accept both
> PyObject* and PyArrayObject* (since they are the same), undoing the
> breakage but allowing to hide the ABI.
>
> (It doesn't give you full flexibility in ABI, it does require that you
> somewhere have an "npy_intp dims[nd]" with the same lifetime as your
> object, etc., but I don't consider that a big disadvantage).
>
>> allow for changes in our structures more freely, we have to hide them
>> from the headers, which means breaking the code that depends on the
>> structure binary layout. Any code that access those directly will need
>> to be changed.
>>
>> There is the particular issue of iterator, which seem quite difficult
>> to make "ABI-safe" without losing significant performance.
>
> I don't agree (for some meanings of "ABI-safe"). You can export the data
> (dataptr/shape/strides) through the ABI, then the iterator uses these in
> whatever way it wishes consumer-side. Sort of like PEP 3118 without the
> performance degradation. The only sane way IMO of doing iteration is
> building it into the consumer anyway.

(I have not read the whole cython discussion yet)

What do you mean by "building iteration in the consumer" ? My
understanding is that any data export would be done through a level of
indirection (dataptr/shape/strides). Conceptually, I can't see how one
could keep ABI without that level of indirection without some compile.
In the case of iterator, that means multiple pointer chasing per
sample -- i.e. the tight loop issue you mentioned earlier for
PyArray_DATA is the common case for iterator.

I can only see two ways of doing fast (special casing) iteration:
compile-time special casing or runtime optimization. Compile-time
requires access to the internals (even if one were to use C++ with
advanced template magic ala STL/iterator, I don't think one can get
performance if everything is not in the headers, but maybe C++
compilers are super smart those days in ways I can't comprehend). I
would think runtime is the long-term solution, but that's far away,

David