[Cython] CEP1000: Native dispatch through callables

Fri Apr 13 14:27:48 CEST 2012

Dag Sverre Seljebotn, 13.04.2012 13:59:
> On 04/13/2012 01:38 PM, Stefan Behnel wrote:
>> Robert Bradshaw, 13.04.2012 12:17:
>>> On Fri, Apr 13, 2012 at 1:52 AM, Dag Sverre Seljebotn wrote:
>>>> On 04/13/2012 01:38 AM, Robert Bradshaw wrote:
>>>>> Minor nit: I don't think should_dereference is worth branching on, if
>>>>> one wants to save the allocation one can still use a variable-sized
>>>>> type and point to oneself. Yes, that's an extra dereference, but the
>>>>> memory is already likely close and it greatly simplifies the logic.
>>>>> But I could be wrong here.
>>>>
>>>>
>>>> Those minor nits are exactly what I seek; since Travis will have the first
>>>> implementation in numba<->SciPy, I just want to make sure that what he
>>>> does will work efficiently work Cython.
>>>
>>> I have to admit building/invoking these var-arg-sized __nativecall__
>>> records seems painful. Here's another suggestion:
>>>
>>> struct {
>>>      void* pointer;
>>>      size_t signature; // compressed binary representation, 95% coverage
> 
> Once you start passing around functions that take memory view slices as
> arguments, that 95% estimate will be off I think.

Yes, I really think it makes sense to keeps IDs unique only over the
runtime of the application. (Note that using ssize_t instead of size_t
would allow setting the ID to -1 to disable signature matching, in case
that's ever needed.)

>>>      char* long_signature; // used if signature is not representable in
>>> a size_t, as indicated by signature = 0
>>> } record;
>>>
>>> These char* could optionally be allocated at the end of the record*
>>> for optimal locality. We could even dispense with the binary
>>> signature, but having that option allows us to avoid strcmp for stuff
>>> like d)d and ffi)f.
>>
>> Assuming we use literals and a const char* for the signature, the C
>> compiler would cut down the number of signature strings automatically for
>> us. And a pointer comparison is the same as a size_t comparison.
> 
> I'll go one further: Intern Python bytes objects. It's just a PyObject*,
> but it's *required* (or just strongly encouraged) to have gone through
> 
> sig = sys.modules['_nativecall']['interned_db'].setdefault(sig, sig)
> 
> Obviously in a PEP you'd have a C-API function for such interning
> (completely standalone utility). Performance of interning operation itself
> doesn't matter...
> 
> Unless CPython has interning features itself, like in Java? Was that
> present back in the day and then ripped out?

AFAIR, it always had to be done explicitly and is only available for
unicode objects in Py3 (and only for bytes objects in Py2). The CPython
parser also does it for identifiers, but it's not done automatically for
anything else. It's also not cheap to do - it would require a weakref dict
to accommodate for the temporary allocation of large strings, and weak
references have a certain overhead.

In any case, this is an entirely different use case that should be handled
differently from normal string interning.

> Requiring interning is somewhat less elegant in one way, but it makes a lot
> of other stuff much simpler.
> 
> That gives us
> 
> struct {
>     void *pointer;
>     PyBytesObject *signature;
> } record;
> 
> and then you allocate a NULL-terminated arrays of these for all the overloads.

However, the problem is the setup. These references will have to be created
at init time and discarded during runtime termination. Not a problem for
Cython generated code, but some overhead for hand written code.

Since the size of these structs is not a problem, I'd prefer keeping Python
objects out of the game and using an ssize_t ID instead, inferred from a
char* signature at module init time by calling a C-API function. That
avoids the need for any cleanup.

Stefan