[Python-Dev] C-level duck typing

Wed May 16 14:47:47 CEST 2012

Stefan Behnel wrote:
> Dag Sverre Seljebotn, 16.05.2012 12:48:
>> On 05/16/2012 11:50 AM, "Martin v. Löwis" wrote:
>>>> Agreed in general, but in this case, it's really not that easy. A C
>>>> function call involves a certain overhead all by itself, so calling into
>>>> the C-API multiple times may be substantially more costly than, say,
>>>> calling through a function pointer once and then running over a
>>>> returned C
>>>> array comparing numbers. And definitely way more costly than running over
>>>> an array that the type struct points to directly. We are not talking
>>>> about
>>>> hundreds of entries here, just a few. A linear scan in 64 bit steps over
>>>> something like a hundred bytes in the L1 cache should hardly be
>>>> measurable.
>>> I give up, then. I fail to understand the problem. Apparently, you want
>>> to do something with the value you get from this lookup operation, but
>>> that something won't involve function calls (or else the function call
>>> overhead for the lookup wouldn't be relevant).
>> In our specific case the value would be an offset added to the PyObject*,
>> and there we would find a pointer to a C function (together with a 64-bit
>> signature), and calling that C function (after checking the 64 bit
>> signature) is our final objective.
> 
> I think the use case hasn't been communicated all that clearly yet. Let's
> give it another try.
> 
> Imagine we have two sides, one that provides a callable and the other side
> that wants to call it. Both sides are implemented in C, so the callee has a
> C signature and the caller has the arguments available as C data types. The
> signature may or may not match the argument types exactly (float vs.
> double, int vs. long, ...), because the caller and the callee know nothing
> about each other initially, they just happen to appear in the same program
> at runtime. All they know is that they could call each other through Python
> space, but that would require data conversion, tuple packing, calling,
> tuple unpacking, data unpacking, and then potentially the same thing on the
> way back. They want to avoid that overhead.
> 
> Now, the caller needs to figure out if the callee has a compatible
> signature. The callee may provide more than one signature (i.e. more than
> one C call entry point), perhaps because it is implemented to deal with
> different input data types efficiently, or perhaps because it can
> efficiently convert them to its expected input. So, there is a signature on
> the caller side given by the argument types it holds, and a couple of
> signature on the callee side that can accept different C data input. Then
> the caller needs to find out which signatures there are and match them
> against what it can efficiently call. It may even be a JIT compiler that
> can generate an efficient call signature on the fly, given a suitable
> signature on callee side.

> 
> An example for this is an algorithm that evaluates a user provided function
> on a large NumPy array. The caller knows what array type it is operating
> on, and the user provided function may be designed to efficiently operate
> on arrays of int, float and double entries.

Given that use case, can I suggest the following:

Separate the discovery of the function from its use.
By this I mean first lookup the function (outside of the loop)
then use the function (inside the loop).

It would then be possible to lookup the function pointer, using the 
standard API, PyObject_GetAttr (or maybe _PyType_Lookup).
Then when it came to applying the function, the function pointer could 
be used directly.

To do this would require an extra builtin-function-like object, which
would wrap the C function pointer. Currently the builtin (C) function
type only supports a very limited range of types for the underlying 
function pointer.
For example, an extended builtin-function could support (among other 
types) the C function type double (*func)(double, double). The extended 
builtin-function would be a Python callable, but would allow C 
extensions such to NumPy to access the underlying C function directly.

The builtin-function declaration would consist of a pointer to the 
underlying function pointer and a type declaration which states which 
types it accepts. The VM would be responsible for any unboxing/boxing 
required.
E.g float.__add__ could be constructed from a very simple C function 
(that adds two doubles and returns a double) and a type declaration:
(cdouble, cdouble)->cdouble.

Allowable types would be intptr_t, double or PyObject* (and maybe char*)
PyObject* types could be further qualified with their Python type.
Not allowing char, short, unsigned etc may seem like its too 
restrictive, but it prevents an explosion of possible types.
Allowing only 3 C-level types and no more than 3 parameters (plus 
return) means that all 121 (3**4+3**3+3**2+3**1+3**0) permutations can 
be handled without resorting to ctypes/ffi.

Example usage:

typedef double (*ddd_func)(double, double);
ddd_func cfunc;
PyObject *func = PyType_Lookup(the_type, the_attribute);
if (Py_TYPE(func) == Py_ExtendedBuiltinFunction_Type &&
     str_cmp(Py_ExtendedFunctionBuiltin_TypeOf(func), "d,d->d") == 0)
     cfunc = Py_ExtendedFunctionBuiltin_GetFunctionPtr(func);
else
     goto feature_not_provided;
for (;;)
    /* Loop using cfunc */

[snip]

Cheers,
Mark.