[Python-Dev] C-level duck typing

Thu May 17 14:14:23 CEST 2012

Mark Shannon, 17.05.2012 12:38:
> Dag Sverre Seljebotn wrote:
>> On 05/16/2012 10:24 PM, Robert Bradshaw wrote:
>>> On Wed, May 16, 2012 at 11:33 AM, "Martin v. Löwis"<martin at v.loewis.de> 
>>> wrote:
>>>>> Does this use case make sense to everyone?
>>>>>
>>>>> The reason why we are discussing this on python-dev is that we are
>>>>> looking
>>>>> for a general way to expose these C level signatures within the Python
>>>>> ecosystem. And Dag's idea was to expose them as part of the type object,
>>>>> basically as an addition to the current Python level tp_call() slot.
>>>>
>>>> The use case makes sense, yet there is also a long-standing solution
>>>> already
>>>> to expose APIs and function pointers: the capsule objects.
>>>>
>>>> If you want to avoid dictionary lookups on the server side, implement
>>>> tp_getattro, comparing addresses of interned strings.
>>>
>>> Yes, that's an idea worth looking at. The point implementing
>>> tp_getattro to avoid dictionary lookups overhead is a good one, worth
>>> trying at least. One drawback is that this approach does require the
>>> GIL (as does _PyType_Lookup).
>>>
>>> Regarding the C function being faster than the dictionary lookup (or
>>> at least close enough that the lookup takes time), yes, this happens
>>> all the time. For example one might be solving differential equations
>>> and the "user input" is essentially a set of (usually simple) double
>>> f(double) and its derivatives.
>>
>> To underline how this is performance critical to us, perhaps a full
>> Cython example is useful.
>>
>> The following Cython code is a real world usecase. It is not too
>> contrived in the essentials, although simplified a little bit. For
>> instance undergrad engineering students could pick up Cython just to play
>> with simple scalar functions like this.
>>
>> from numpy import sin
>> # assume sin is a Python callable and that NumPy decides to support
>> # our spec to also support getting a "double (*sinfuncptr)(double)".
>>
>> # Our mission: Avoid to have the user manually import "sin" from C,
>> # but allow just using the NumPy object and still be fast.
>>
>> # define a function to integrate
>> cpdef double f(double x):
>>     return sin(x * x) # guess on signature and use "fastcall"!
>>
>> # the integrator
>> def integrate(func, double a, double b, int n):
>>     cdef double s = 0
>>     cdef double dx = (b - a) / n
>>     for i in range(n):
>>         # This is also a fastcall, but can be cached so doesn't
>>         # matter...
>>         s += func(a + i * dx)
>>     return s * dx
>>
>> integrate(f, 0, 1, 1000000)
>>
>> There are two problems here:
>>
>>  - The "sin" global can be reassigned (monkey-patched) between each call
>> to "f", no way for "f" to know. Even "sin" could do the reassignment. So
>> you'd need to check for reassignment to do caching...
> 
> Since Cython allows static typing why not just declare that func can treat
> sin as if it can't be monkeypatched?

You'd simply say

    cdef object sin    # declare it as a C variable of type 'object'
    from numpy import sin

That's also the one obvious way to do it in Cython.

> Moving the load of a global variable out of the loop does seem to be a
> rather obvious optimisation, if it were declared to be legal.

My proposal was to simply extract any C function pointers at assignment
time, i.e. at import time in the example above. Signature matching can then
be done at the first call and the result can be cached as long as the
object variable isn't changed. All of that is local to the module and can
thus easily be controlled at code generation time.

Stefan