[Cython] [Python-Dev] C-level duck typing

mark florisson markflorisson88 at gmail.com
Thu May 17 13:34:36 CEST 2012

On 17 May 2012 11:14, Stefan Behnel <stefan_ml at behnel.de> wrote:
> mark florisson, 17.05.2012 11:26:
>> On 17 May 2012 08:36, Stefan Behnel <stefan_ml at behnel.de> wrote:
>>> Dag Sverre Seljebotn, 17.05.2012 09:12:
>>>> Stefan Behnel wrote:
>>>>> mark florisson, 16.05.2012 21:49:
>>>>>> On 16 May 2012 20:15, Stefan Behnel wrote:
>>>>>>> Why not just use a custom attribute on callables that hold a
>>>>>>> PyCapsule? Whenever we see inside of a Cython implemented function
>>>>>>> that an object variable that was retrieved from the outside,
>>>>>>> either as a function argument or as the result of a function call,
>>>>>>> is being called, we try to unpack a C function pointer from it on
>>>>>>> all assignments to the variable. If that works, we can scan for a
>>>>>>> suitable signature (either right away or lazily on first access)
>>>>>>> and cache that. On each subsequent call through that variable,
>>>>>>> the cached C function will be used.
>>>>>>> That means we'd replace Python variables that are being called by
>>>>>>> multiple local variables, one that holds the object and one for each C
>>>>>>> function with a different signature that it is being called with. We
>>>>>>> set the C function variables to NULL when the Python function variable
>>>>>>> is being assigned to.
>>>>>>> When the C function variable is NULL on call, we scan for a matching
>>>>>>> signature and assign it to the variable.  When no matching signature
>>>>>>> can be found, we set it to (void*)-1.
>>>>>>> Additionally, we allow explicit user casts of Python objects to C
>>>>>>> function types, which would then try to unpack the C function, raising
>>>>>>> a TypeError on mismatch.
>>>>>>> Assignments to callable variables can be expected to occur much less
>>>>>>> frequently than calls to them, so this will give us a good trade-off
>>>>>>> in most cases. I don't see why this kind of caching would be any slower
>>>>>>> inside of loops than what we were discussing so far.
>>>>>> This works really well for local variables, but for globals, def
>>>>>> methods or callbacks as attributes, this won't work so well, as they
>>>>>> may be rebound at any time outside of the module scope.
>>>>> Only half true for globals, which can be declared "cdef object", e.g.
>>>>> for imported names. That would allow Cython to see all possible
>>>>> reassignments in a module, which would then apply the above scheme.
>>>>> I don't think def methods are a use case for this because you'd either
>>>>> cpdef them or even cdef them if you want speed. If you want them to be
>>>>> overridable, you'll have to live with the speed penalty that that
>>>>> implies.
>>>>> For object attributes, you have to pay the penalty of a lookup anyway,
>>>>> no way around that. We can't even cache anything here (e.g. with a
>>>>> borrowed reference) because the attribute may be rebound to another
>>>>> object that happens to live at the same address as the previous one.
>>>>> However, if you want speed, you'd do it as in CPython and assign the
>>>>> object to a local variable to pay the lookup of only once. Problem
>>>>> solved.
>>>> 'Problem solved' by pushing the work over to the user? By that line
>>>> of argument, why not just kill of Cython and require users to write C?
>>> What part of the work does the above proposal push to the user? To make it
>>> explicit that an object attribute or a global variable is not expected to
>>> change during whatever a loop does? Well, yes. If the user knows that, a
>>> global cdef or an assignment to a local variable is the easiest, safest,
>>> fastest and most obvious way to tell Cython that it should take advantage
>>> of it. Why invent yet another declaration for this?
>>>> Hyperbole aside; do you really believe it is worth dropping a relatively
>>>> easy optimization just to make the C level code more to the taste of
>>>> some python-dev posters?
>>> I find the above much easier for all sides. It's easier to implement for us
>>> and others, it doesn't have any impact on CPython and I also find it easier
>>> to understand for users.
>>> Besides, I was only responding to Mark's remarks (pun not intended) about
>>> the few cases where this may not immediately yield the expected advantage.
>>> They are easy to fix, that's all I was saying. In most cases, this simple
>>> scheme will do the right thing without any user interaction, and it does
>>> not require any changes or future constraints on CPython.
>>> So, why not just implement this for now and *then* re-evaluate if we really
>>> need more, and if we can really do better?
>> Hm, I think we should implement fast dispatch first
> Sure, the one builds on the other. The question is only how you'd get at
> the pointer to the signatures. I say, a PyCapsule in an attribute will do
> in most cases.
> So, basically, I suggest to implement a fast, cached dispatch on top of a
> simple function object attribute first. Then we test and benchmark that,
> then we can decide what else we need. Once this infrastructure is
> implemented, adapting it to other ways of finding the signature dispatcher
> for a given function object will be trivial. And having it available will
> allow us to state exactly what the performance advantage of each such
> approach is and to make a case why (or why not) we need to change something
> outside of Cython in order to get it.

I guess that's a good idea, although I would suggest typechecking for
CythonFunction and giving it a pointer to a list of signatures (or
maybe make it variable sized and put the signatures directly into the
function). If it works well we can plunge ahead and generalize it for
arbitrary types. (I think in any case we only needed to change things
outside of Cython to standardize stuff across projects, not because we
technically need it).

> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

More information about the cython-devel mailing list