[Cython] CEP1000: Native dispatch through callables

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Fri Apr 13 13:59:45 CEST 2012

On 04/13/2012 01:38 PM, Stefan Behnel wrote:
> Robert Bradshaw, 13.04.2012 12:17:
>> On Fri, Apr 13, 2012 at 1:52 AM, Dag Sverre Seljebotn wrote:
>>> On 04/13/2012 01:38 AM, Robert Bradshaw wrote:
>>>> Have you given any thought as to what happens if __call__ is
>>>> re-assigned for an object (or subclass of an object) supporting this
>>>> interface? Or is this out of scope?
>>> Out-of-scope, I'd say. Though you can always write an object that detects if
>>> you assign to __call__...
> +1 for out of scope. This is a pure C level feature.
>>>> Minor nit: I don't think should_dereference is worth branching on, if
>>>> one wants to save the allocation one can still use a variable-sized
>>>> type and point to oneself. Yes, that's an extra dereference, but the
>>>> memory is already likely close and it greatly simplifies the logic.
>>>> But I could be wrong here.
>>> Those minor nits are exactly what I seek; since Travis will have the first
>>> implementation in numba<->SciPy, I just want to make sure that what he does
>>> will work efficiently work Cython.
>> +1
>> I have to admit building/invoking these var-arg-sized __nativecall__
>> records seems painful. Here's another suggestion:
>> struct {
>>      void* pointer;
>>      size_t signature; // compressed binary representation, 95% coverage

Once you start passing around functions that take memory view slices as 
arguments, that 95% estimate will be off I think.

>>      char* long_signature; // used if signature is not representable in
>> a size_t, as indicated by signature = 0
>> } record;
>> These char* could optionally be allocated at the end of the record*
>> for optimal locality. We could even dispense with the binary
>> signature, but having that option allows us to avoid strcmp for stuff
>> like d)d and ffi)f.
> Assuming we use literals and a const char* for the signature, the C
> compiler would cut down the number of signature strings automatically for
> us. And a pointer comparison is the same as a size_t comparison.

I'll go one further: Intern Python bytes objects. It's just a PyObject*, 
but it's *required* (or just strongly encouraged) to have gone through

sig = sys.modules['_nativecall']['interned_db'].setdefault(sig, sig)

Obviously in a PEP you'd have a C-API function for such interning 
(completely standalone utility). Performance of interning operation 
itself doesn't matter...

Unless CPython has interning features itself, like in Java? Was that 
present back in the day and then ripped out?

Requiring interning is somewhat less elegant in one way, but it makes a 
lot of other stuff much simpler.

That gives us

struct {
     void *pointer;
     PyBytesObject *signature;
} record;

and then you allocate a NULL-terminated arrays of these for all the 

> That would only apply at a per-module level, though, so it would require an
> indirection for the signature IDs. But it would avoid a global registry.
> Another idea would be to set the signature ID field to 0 at the beginning
> and call a C-API function to let the current runtime assign an ID>  0,
> unique for the currently running application. Then every user would only
> have to parse the signature once to adapt to the respective ID and could
> otherwise branch based on it directly.
> For Cython, we could generate a static ID variable for each typed call that
> we found in the sources. When encountering a C signature on a callable,
> either a) the ID variable is still empty (initial case), then we parse the
> signature to see if it matches the expected signature. If it does, we
> assign the corresponding ID to the static ID variable and issue a direct
> call. If b) the ID field is already set (normal case), we compare the
> signature IDs directly and issue a C call it they match. If the IDs do not
> match, we issue a normal Python call.
>>> Right... if we do some work to synchronize the types for Cython modules
>>> generated by the same version of Cython, we're left with 3-4 types for
>>> Cython, right? Then a couple for numba and one for f2py; so on the order of
>>> 10?
>> No, I think each closure is its own type.
> And that even applies to fused functions, right? They'd have one closure
> for each type combination.
>>> An alternative is do something funny in the type object to get across the
>>> offset-in-object information (abusing the docstring, or introduce our own
>>> flag which means that the type object has an additional non-standard field
>>> at the end).
>> It's a hack, but the flag + non-standard field idea might just work...
> Plus, it wouldn't have to stay a non-standard field. If it's accepted into
> CPython 3.4, we could safely use it in all existing versions of CPython.

Sounds good. Perhaps just find a single "extended", then add a new flag 
field in our payload, in case we need to extend the types object yet 
again later and run out of unused flag bits (TBD: figure out how many 
unused flag bits there are).


More information about the cython-devel mailing list