[Cython] New early-binding concept [was: CEP1000]

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Fri Apr 20 08:49:32 CEST 2012

On 04/20/2012 08:21 AM, Stefan Behnel wrote:
> Robert Bradshaw, 20.04.2012 02:52:
>> On Thu, Apr 19, 2012 at 3:53 AM, mark florisson wrote:
>>> On 19 April 2012 08:17, Dag Sverre Seljebotn wrote:
>>>> On 04/19/2012 08:41 AM, Stefan Behnel wrote:
>>>>> Dag Sverre Seljebotn, 18.04.2012 23:35:
>>>>>> from numpy import sqrt, sin
>>>>>> cdef double f(double x):
>>>>>>      return sqrt(x * x) # or sin(x * x)
>>>>>> Of course, here one could get the pointer in the module at import time.
>>>>> That optimisation would actually be very worthwhile all by itself. I mean,
>>>>> we know what signatures we need for globally imported functions throughout
>>>>> the module, so we can reduce the call to a single jump through a function
>>>>> pointer (although likely with a preceding NULL check, which the branch
>>>>> prediction would be happy to give us for free). At least as long as sqrt
>>>>> is not being reassigned, but that should hit the 99% case.
>>>>>> However, here:
>>>>>> from numpy import sqrt
>>>> Correction: "import numpy as np"
>>>>>> cdef double f(double x):
>>>>>>      return np.sqrt(x * x) # or np.sin(x * x)
>>>>>> the __getattr__ on np sure is larger than any effect we discuss.
>>>>> Yes, that would have to stay a .pxd case, I guess.
>>>> How about this mini-CEP:
>>>> Modules are allowed to specify __nomonkey__ (or __const__, or
>>>> __notreassigned__), a list of strings naming module-level variables where
>>>> "we don't hold you responsible if you assume no monkey-patching of these".
>>>> When doing "import numpy as np", then (assuming "np" is never reassigned in
>>>> the module), at import time we check all names looked up from it in
>>>> __nomonkey__, and if so treat them as "from numpy import sqrt as 'np.sqrt'",
>>>> i.e. the "np." is just a namespace mechanism.
>>> I like the idea. I think this could be generalized to a 'final'
>>> keyword, that could also enable optimizations for cdef class
>>> attributes. So you'd say
>>> cdef final object np
>>> import numpy as np
>>> For class attributes this would tell the compiler that it will not be
>>> rebound, which means you could check if attributes are initialized in
>>> the initializer, or just pull such checks (as wel as bounds checks),
>>> at least for memoryviews, out of loops, without worrying whether it
>>> will be reassigned in the meantime.
>> final is a nice way to describe this. If we were to introduce a new
>> keyword, static might do as well.
>> It seems more natural to do this in the numpy.pxd file (perhaps it
>> could just be declared as a final object) and that would allow us to
>> not worry about re-assignment. Cython could then try to keep that
>> contract for any modules it compiles. (This is, however, a bit more
>> restrictive, though one can always cimport and import modules under
>> different names.)
> However, it's actually not the module that's "final" in this regard but the
> functions it exports - *they* do not change and neither do their C
> signatures. So the "final" modifier should stick to the functions (possibly
> declared at the "cdef extern" line), which would then allow us to resolve
> and cache the C function pointers at import time.

Are there any advantages at getting this information at compile time 
rather than import time?

If you got the full signature it would be a different matter (for type 
inference etc.); you could essentially do something like

cdef final double sin(double)
cdef final float sin(float)
cdef final double cos(double)

...and you would know types at compile-time, and get pointers for those 
at import time.

> That mimics the case of the current "final" classes and methods, where we
> take off the method pointers at compile time. And inside of numpy.pxd is
> the perfect place to declare this, not as part of the import.


a) a __finals__ in the NumPy Python module is something the NumPy 
project can maintain, and which can be different on different releases 
etc. (OK, NumPy is special because it is so high profile, but any other 

b) a __finals__ is something PyPy, Numba, etc. could benefit from as well

Of course, one doesn't exclude the other. And if a library implements 
CEP1000 + provides __finals__, it would be trivial to run a pxd 
generator on it.


More information about the cython-devel mailing list