[Cython] [Python-Dev] C-level duck typing

Stefan Behnel stefan_ml at behnel.de
Thu May 17 14:58:43 CEST 2012

mark florisson, 17.05.2012 13:34:
> On 17 May 2012 11:14, Stefan Behnel <stefan_ml at behnel.de> wrote:
>> mark florisson, 17.05.2012 11:26:
>>> On 17 May 2012 08:36, Stefan Behnel <stefan_ml at behnel.de> wrote:
>>>> Dag Sverre Seljebotn, 17.05.2012 09:12:
>>>>> Stefan Behnel wrote:
>>>>>> mark florisson, 16.05.2012 21:49:
>>>>>>> On 16 May 2012 20:15, Stefan Behnel wrote:
>>>>>>>> Why not just use a custom attribute on callables that hold a
>>>>>>>> PyCapsule? Whenever we see inside of a Cython implemented function
>>>>>>>> that an object variable that was retrieved from the outside,
>>>>>>>> either as a function argument or as the result of a function call,
>>>>>>>> is being called, we try to unpack a C function pointer from it on
>>>>>>>> all assignments to the variable. If that works, we can scan for a
>>>>>>>> suitable signature (either right away or lazily on first access)
>>>>>>>> and cache that. On each subsequent call through that variable,
>>>>>>>> the cached C function will be used.
>>>>>>>> That means we'd replace Python variables that are being called by
>>>>>>>> multiple local variables, one that holds the object and one for each C
>>>>>>>> function with a different signature that it is being called with. We
>>>>>>>> set the C function variables to NULL when the Python function variable
>>>>>>>> is being assigned to.
>>>>>>>> When the C function variable is NULL on call, we scan for a matching
>>>>>>>> signature and assign it to the variable.  When no matching signature
>>>>>>>> can be found, we set it to (void*)-1.
>>>>>>>> Additionally, we allow explicit user casts of Python objects to C
>>>>>>>> function types, which would then try to unpack the C function, raising
>>>>>>>> a TypeError on mismatch.
>>>>>>>> Assignments to callable variables can be expected to occur much less
>>>>>>>> frequently than calls to them, so this will give us a good trade-off
>>>>>>>> in most cases. I don't see why this kind of caching would be any slower
>>>>>>>> inside of loops than what we were discussing so far.
>>>>>>> This works really well for local variables, but for globals, def
>>>>>>> methods or callbacks as attributes, this won't work so well, as they
>>>>>>> may be rebound at any time outside of the module scope.
>>>>>> Only half true for globals, which can be declared "cdef object", e.g.
>>>>>> for imported names. That would allow Cython to see all possible
>>>>>> reassignments in a module, which would then apply the above scheme.
>>>>>> I don't think def methods are a use case for this because you'd either
>>>>>> cpdef them or even cdef them if you want speed. If you want them to be
>>>>>> overridable, you'll have to live with the speed penalty that that
>>>>>> implies.
>>>>>> For object attributes, you have to pay the penalty of a lookup anyway,
>>>>>> no way around that. We can't even cache anything here (e.g. with a
>>>>>> borrowed reference) because the attribute may be rebound to another
>>>>>> object that happens to live at the same address as the previous one.
>>>>>> However, if you want speed, you'd do it as in CPython and assign the
>>>>>> object to a local variable to pay the lookup of only once. Problem
>>>>>> solved.
>>>>> 'Problem solved' by pushing the work over to the user? By that line
>>>>> of argument, why not just kill of Cython and require users to write C?
>>>> What part of the work does the above proposal push to the user? To make it
>>>> explicit that an object attribute or a global variable is not expected to
>>>> change during whatever a loop does? Well, yes. If the user knows that, a
>>>> global cdef or an assignment to a local variable is the easiest, safest,
>>>> fastest and most obvious way to tell Cython that it should take advantage
>>>> of it. Why invent yet another declaration for this?
>>>>> Hyperbole aside; do you really believe it is worth dropping a relatively
>>>>> easy optimization just to make the C level code more to the taste of
>>>>> some python-dev posters?
>>>> I find the above much easier for all sides. It's easier to implement for us
>>>> and others, it doesn't have any impact on CPython and I also find it easier
>>>> to understand for users.
>>>> Besides, I was only responding to Mark's remarks (pun not intended) about
>>>> the few cases where this may not immediately yield the expected advantage.
>>>> They are easy to fix, that's all I was saying. In most cases, this simple
>>>> scheme will do the right thing without any user interaction, and it does
>>>> not require any changes or future constraints on CPython.
>>>> So, why not just implement this for now and *then* re-evaluate if we really
>>>> need more, and if we can really do better?
>>> Hm, I think we should implement fast dispatch first
>> Sure, the one builds on the other. The question is only how you'd get at
>> the pointer to the signatures. I say, a PyCapsule in an attribute will do
>> in most cases.
>> So, basically, I suggest to implement a fast, cached dispatch on top of a
>> simple function object attribute first. Then we test and benchmark that,
>> then we can decide what else we need. Once this infrastructure is
>> implemented, adapting it to other ways of finding the signature dispatcher
>> for a given function object will be trivial. And having it available will
>> allow us to state exactly what the performance advantage of each such
>> approach is and to make a case why (or why not) we need to change something
>> outside of Cython in order to get it.
> I guess that's a good idea, although I would suggest typechecking for
> CythonFunction and giving it a pointer to a list of signatures (or
> maybe make it variable sized and put the signatures directly into the
> function). 

Either of the two will do, I think. There will be a slight performance
difference when the CyFunction comes from another module, but that would
only apply to the lookup.

I think it's a good idea to start by only supporting CyFunction to get it
working, then add another fallback for a function attribute, then take a
look at other things.

> If it works well we can plunge ahead and generalize it for
> arbitrary types. (I think in any case we only needed to change things
> outside of Cython to standardize stuff across projects, not because we
> technically need it).



More information about the cython-devel mailing list