[Cython] CEP1000: Native dispatch through callables

Robert Bradshaw robertwb at gmail.com
Fri Apr 13 22:54:12 CEST 2012

On Fri, Apr 13, 2012 at 12:52 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Robert Bradshaw, 13.04.2012 20:21:
>> On Fri, Apr 13, 2012 at 10:26 AM, Robert Bradshaw wrote:
>>> On Fri, Apr 13, 2012 at 4:59 AM, Dag Sverre Seljebotn wrote:
>>>> On 04/13/2012 01:38 PM, Stefan Behnel wrote:
>>>>> That would only apply at a per-module level, though, so it would
>>>>> require an indirection for the signature IDs. But it would avoid a
>>>>> global registry.
>>>>> Another idea would be to set the signature ID field to 0 at the beginning
>>>>> and call a C-API function to let the current runtime assign an ID>  0,
>>>>> unique for the currently running application. Then every user would only
>>>>> have to parse the signature once to adapt to the respective ID and could
>>>>> otherwise branch based on it directly.
>>>>> For Cython, we could generate a static ID variable for each typed call
>>>>> that
>>>>> we found in the sources. When encountering a C signature on a callable,
>>>>> either a) the ID variable is still empty (initial case), then we parse the
>>>>> signature to see if it matches the expected signature. If it does, we
>>>>> assign the corresponding ID to the static ID variable and issue a direct
>>>>> call. If b) the ID field is already set (normal case), we compare the
>>>>> signature IDs directly and issue a C call it they match. If the IDs do not
>>>>> match, we issue a normal Python call.
>>> If I understand correctly, you're proposing
>>> struct {
>>>  char* sig;
>>>  long id;
>>> } sig_t;
>>> Where comparison would (sometimes?) compute id from sig by augmenting
>>> a global counter and dict? Might be expensive to bootstrap, but
>>> eventually all relevant ids would be filled in and it would be quick.
> Yes. If a function is only called once, the overhead won't matter. And
> starting from the second call, it would either be fast if the function
> signature matches or slow anyway if it doesn't match.

There's still data locality issues, including the cached id for the
caller as well as the callee.

>>> Interesting. I wonder what the performance penalty would be over
>>> assuming id is statically computed lots of the time, and using that to
>>> compare against fixed values. And there's memory locality issues as
>>> well.
>> To clarify, I'd really like to have the following as fast as possible:
>> if (callable.sig.id == X) {
>>    // yep, that's what I thought
>> } else {
>>    // generic call
>> }
>> Alternatively, one can imagine wanting to do:
>> switch (callable.sig.id) {
>>     case X:
>>         // I can do this
>>     case Y:
>>         // this is common and fast as well
>>     ...
>>     default:
>>         // generic call
>> }
> Yes, that's the idea.
>> There is some question about how promotion should work (e.g. should
>> this flexibility reside in the caller or the callee (or both, though
>> that could result in a quadratic number of comparisons)?)
> Callees could expose multiple signatures (which would result in a direct
> call for each, without further comparisons), then the caller would have to
> choose between those. However, if none matches exactly, the caller might
> want to promote its arguments and try more signatures. In any case, it's
> the caller that does the work, never the callee.
> We could generate code like this:
>    /* cdef int x = ...
>     * cdef long y = ...
>     * cdef int z       # interesting: what if z is not typed?
>     * z = func(x, y)
>     */
>    if (func.sig.id == id("[int,long] -> int")) {
>         z = ((cast)func.cfunc) (x,y);
>    } else if (sizeof(long) > sizeof(int) &&
>               (func.sig.id == id("[long,long] -> int"))) {
>         z = ((cast)func.cfunc) ((long)x, y);
>    } etc. ... else {
>         /* pack and call as Python function */
>    }
> Meaning, the C compiler could reduce the amount of optimistic call code at
> compile time.

Interesting idea. Alternatively, I wonder if the signature could
reflect exactly-sized types rather than int/long/etc. Perhaps that
would make the code more complicated on both ends...

I'm assuming your id(...) is computed at compile time in this example,
right? Otherwise it would get a bit messier.

- Robert

More information about the cython-devel mailing list