[Python-Dev] C-level duck typing

Wed May 16 16:46:16 CEST 2012

On 05/16/2012 02:16 PM, Stefan Behnel wrote:
> Stefan Behnel, 16.05.2012 13:13:
>> Dag Sverre Seljebotn, 16.05.2012 12:48:
>>> On 05/16/2012 11:50 AM, "Martin v. Löwis" wrote:
>>>>> Agreed in general, but in this case, it's really not that easy. A C
>>>>> function call involves a certain overhead all by itself, so calling into
>>>>> the C-API multiple times may be substantially more costly than, say,
>>>>> calling through a function pointer once and then running over a
>>>>> returned C
>>>>> array comparing numbers. And definitely way more costly than running over
>>>>> an array that the type struct points to directly. We are not talking
>>>>> about
>>>>> hundreds of entries here, just a few. A linear scan in 64 bit steps over
>>>>> something like a hundred bytes in the L1 cache should hardly be
>>>>> measurable.
>>>>
>>>> I give up, then. I fail to understand the problem. Apparently, you want
>>>> to do something with the value you get from this lookup operation, but
>>>> that something won't involve function calls (or else the function call
>>>> overhead for the lookup wouldn't be relevant).
>>>
>>> In our specific case the value would be an offset added to the PyObject*,
>>> and there we would find a pointer to a C function (together with a 64-bit
>>> signature), and calling that C function (after checking the 64 bit
>>> signature) is our final objective.
>>
>> I think the use case hasn't been communicated all that clearly yet. Let's
>> give it another try.
>>
>> Imagine we have two sides, one that provides a callable and the other side
>> that wants to call it. Both sides are implemented in C, so the callee has a
>> C signature and the caller has the arguments available as C data types. The
>> signature may or may not match the argument types exactly (float vs.
>> double, int vs. long, ...), because the caller and the callee know nothing
>> about each other initially, they just happen to appear in the same program
>> at runtime. All they know is that they could call each other through Python
>> space, but that would require data conversion, tuple packing, calling,
>> tuple unpacking, data unpacking, and then potentially the same thing on the
>> way back. They want to avoid that overhead.
>>
>> Now, the caller needs to figure out if the callee has a compatible
>> signature. The callee may provide more than one signature (i.e. more than
>> one C call entry point), perhaps because it is implemented to deal with
>> different input data types efficiently, or perhaps because it can
>> efficiently convert them to its expected input. So, there is a signature on
>> the caller side given by the argument types it holds, and a couple of
>> signature on the callee side that can accept different C data input. Then
>> the caller needs to find out which signatures there are and match them
>> against what it can efficiently call. It may even be a JIT compiler that
>> can generate an efficient call signature on the fly, given a suitable
>> signature on callee side.
>>
>> An example for this is an algorithm that evaluates a user provided function
>> on a large NumPy array. The caller knows what array type it is operating
>> on, and the user provided function may be designed to efficiently operate
>> on arrays of int, float and double entries.
>>
>> Does this use case make sense to everyone?
>>
>> The reason why we are discussing this on python-dev is that we are looking
>> for a general way to expose these C level signatures within the Python
>> ecosystem. And Dag's idea was to expose them as part of the type object,
>> basically as an addition to the current Python level tp_call() slot.
>
> ... and to finish the loop that I started here (sorry for being verbose):
>
> The proposal that Dag referenced describes a more generic way to make this
> kind of extension to type objects from user code. Basically, it allows
> implementers to say "my type object has capability X", in a C-ish kind of
> way. And the above C signature protocol would be one of those capabilities.
>
> Personally, I wouldn't mind making the specific signature extension a
> proposal instead of asking for a general extension mechanism for arbitrary
> capabilities (although that still sounds tempting).

Here's some reasons for the generic proposal:

a) Avoid pre-mature PEP-ing. Look at PEP 3118 for instance; that would 
almost certainly had been better if there had been a few years of 
beta-testing in the wild among Cython and NumPy users.

I think PEP-ing the "nativecall" proposal soon (even in the unlikely 
event that it would be accepted) is bound to give suboptimal results -- 
it needs to be tested in the wild on Cython and SciPy users for a few 
years first. (Still, we can't ask those to recompile their Python.)

My proposal is then about allowing people to play with their own slots, 
and deploy that to users, without having to create a PEP for their 
specific usecase.

b) There's more than the "nativecall" we'd use this for in Cython. 
Something like compiled abstract base classes/compiled multiple 
inheritance/Go-style interfaces for instance. Some of those things we'd 
like to use it for certainly will never be a PEP.

c) Get NumPy users off their PyObject_TypeCheck habit, which IMO is 
damaging to the NumPy project (because you can't that easily play around 
with different array libraries and new ideas -- NumPy is the only array 
type you can ever have, because millions of code lines have been written 
using its C API. My proposal provides a way of moving that API over to 
accept any object implementing a NumPy-specified spec. We certainly 
don't want to have a 20 nanosecond speed regression on every single call 
they make to the NumPy C API, and you simply don't rewrite millions of 
code lines.).

I think having millions of lines of "Python" code written in C, and not 
Python, and considering 20 nanoseconds as "much", is perhaps not the 
typical usecase on this list. Still, that's the world of scientific 
computing with Python. Python-the-interpreter is just the "shell" around 
the real stuff that all happens in C or Fortran.

(Cython is not just about scientific computing, as I'm sure Stefan has 
told you all about. But in other situations I think there's less of a 
need of "cross-talk" between extensions without going through the Python 
API.)

I guess I don't get "if something needs to be fast on the C level, then 
that one specific usecase should be in a PEP". And all we're asking for 
is really that one bit in tp_flags.

Dag