[Cython] SEP 201 draft: Native callable objects

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Fri Jun 1 15:49:21 CEST 2012


On 05/31/2012 10:13 PM, Robert Bradshaw wrote:
> On Thu, May 31, 2012 at 12:29 PM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no>  wrote:
>> On 05/31/2012 08:50 PM, Robert Bradshaw wrote:
>>>
>>> On Thu, May 31, 2012 at 7:04 AM, Dag Sverre Seljebotn
>>> <d.s.seljebotn at astro.uio.no>    wrote:
>>>>
>>>> [Discussion on numfocus at googlegroups.com please]
>>>>
>>>> I've uploaded a draft-state SEP 201 (previously CEP 1000):
>>>>
>>>> https://github.com/numfocus/sep/blob/master/sep201.rst
>>>>
>>>> """
>>>> Many callable objects are simply wrappers around native code. This holds
>>>> for
>>>> any Cython function, f2py functions, manually written CPython extensions,
>>>> Numba, etc.
>>>>
>>>> Obviously, when native code calls other native code, it would be nice to
>>>> skip the significant cost of boxing and unboxing all the arguments.
>>>> """
>>>>
>>>>
>>>> The thread about this on the Cython list is almost endless:
>>>>
>>>> http://thread.gmane.org/gmane.comp.python.cython.devel/13416/focus=13443
>>>>
>>>> There was a long discussion on the key-comparison vs. interned-string
>>>> approach. I've written both up in SEP 201 since it was the major point of
>>>> contention. There was some benchmarks starting here:
>>>>
>>>> http://thread.gmane.org/gmane.comp.python.cython.devel/13416/focus=13443
>>>>
>>>> And why provide a table and not a get_function_pointer starting here:
>>>>
>>>> http://thread.gmane.org/gmane.comp.python.cython.devel/13416/focus=13443
>>>>
>>>> For those who followed that and don't want to read the entire spec, the
>>>> aspect of flags is new. How do we avoid to duplicate entries/check
>>>> against
>>>> two signatures for cases like a GIL-holding caller wanting to call a
>>>> nogil
>>>> function? My take: For key-comparison you can compare under a mask, for
>>>> interned-string we should have additional flags field.
>>>>
>>>> The situation is a bit awkward: The Cython list consensus (well, me and
>>>> Robert Bradshaw) decided on what is "Approach 1" (key-comparison) in SEP
>>>> 201. I pushed for that.
>>>>
>>>> Still, now that a month has passed, I just think key-comparison is too
>>>> ugly,
>>>> and that the interning mechanism shouldn't be *that* hard to code up,
>>>> probably 500 lines of C code if one just requires the GIL in a first
>>>> iteration, and that keeping the spec simpler is more important.
>>>>
>>>> So I'm tentatively proposing Approach 2.
>>>
>>>
>>> I'm still not convinced that a hybrid approach, where signatures below
>>> some cutoff are compiled down to keys, is not a worthwhile approach.
>>> This gets around variable-length keys (both the complexity and
>>> possible runtime costs for long keys) and allows simple libraries to
>>> produce and consume fast callables without participating in the
>>> interning mechanism.
>>
>> I still think this gives us the "worst of both worlds", all the
>> disadvantages and none of the advantages.
>
> It avoids the one of the primary disadvantage of keys, namely the
> variable length complexity.
>
>> How many simple libraries are there really? Cython on one end, the
>> magnificently complicated NumPy ufuncs on the other? Thinking big, perhaps
>> PyPy and Julia? Cython, PyPy, Julia would all have to deal with long
>> signatures anyway. And NumPy ufuncs are already complicated so even more
>> low-level stuff wouldn't hurt.
>
> I was thinking of, for example, a differential equation solver written
> in C, C++, or Fortran that could take a PyNativeCallableTable*
> directly, primarily avoiding welding this spec to Python.

I'm not sure how real-world that is in the end. But, the size of Cython 
generated code would be kept down for most modules as it wouldn't need 
to bundle an interner.

AND, a problem with interning is spreading the signature strings all 
over memory (in the event you actually need to look at the contents). 
With a smart interner I guess this can be eliminated to some extent, but 
much better if one doesn't have to worry, and if all short signatures 
are keys you don't.

Playing along:

  a) It'd be very nice to avoid explicit decoding. I think one should be 
able to cast the key to char[]; this a) avoids having to allocate a 
buffer on the stack to pass to a Decode function, b) let's you inspect 
the table in a debugger easily.

  b) Flags are needed in addition to interning; GIL status and exception 
return values do not require exact matches. I think more than 3 bits are 
needed for flags => our minimal padded table entry size is actually 24 
bytes! (And this is OK, my benchmarks weren't affected by 8-byte vs 
16-byte comparisons, branching is so dominating.)

Now, 16 bits seems about right for flags, so this means we can actually 
for free use 14-char keys (12 for signature data, one for \0, one for guard)

That pushes the number of non-interning signatures high enough to make 
it really fit 95% of the use-cases. I feel 6 chars is a little low, 
remember that a "pointer to a double complex" is "&Zd" by itself unless 
we play with encoding.

BUT, it then gets rather complicated to have things work on 
little-endian vs. big-endian though as the guard byte must be in 
different positions. If you want to align the pointer you get this:

typedef struct {
     void *funcptr;
     union {
         union {
             struct {
                 uint16_t interned_flags;
                 uint16_t padding1;
                 uint32_t padding2;
                 uint64_t interned_sig;
             };
             struct {
                 uint16_t flags;
                 char sig[14];
             };
         } big_endian;

         union {
             struct {
                 uint64_t interned_sig;
                 uint32_t padding1;
                 uint16_t padding2;
                 uint16_t interned_flags;
             };
             struct {
                 char sig[14];
                 uint16_t flags;
             };
         } little_endian;
     };
};

(interned_flags and flags is really the same, I just didn't want to mess 
with the struct alignment)

So I think this is *almost* there, but it certainly gets complicated 
because of endianness issues.

Of course, an alternative is to not have the interned_sig be 64-bit 
aligned. Or, play with adapting the string/guard bytes in the middle, 
but that sort of breaks a) above.

Thinking about this is psychologically difficult because it's very 
likely bikeshedding, but OTOH once the spec is out in the wild it will 
never be worth it to change it so some care is called for... oh well, at 
least I'm having fun!

Dag


More information about the cython-devel mailing list