[Cython] New early-binding concept [was: CEP1000]

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Fri Apr 20 08:55:40 CEST 2012


On 04/20/2012 08:49 AM, Dag Sverre Seljebotn wrote:
> On 04/20/2012 08:21 AM, Stefan Behnel wrote:
>> Robert Bradshaw, 20.04.2012 02:52:
>>> On Thu, Apr 19, 2012 at 3:53 AM, mark florisson wrote:
>>>> On 19 April 2012 08:17, Dag Sverre Seljebotn wrote:
>>>>> On 04/19/2012 08:41 AM, Stefan Behnel wrote:
>>>>>> Dag Sverre Seljebotn, 18.04.2012 23:35:
>>>>>>>
>>>>>>> from numpy import sqrt, sin
>>>>>>>
>>>>>>> cdef double f(double x):
>>>>>>> return sqrt(x * x) # or sin(x * x)
>>>>>>>
>>>>>>> Of course, here one could get the pointer in the module at import
>>>>>>> time.
>>>>>>
>>>>>> That optimisation would actually be very worthwhile all by itself.
>>>>>> I mean,
>>>>>> we know what signatures we need for globally imported functions
>>>>>> throughout
>>>>>> the module, so we can reduce the call to a single jump through a
>>>>>> function
>>>>>> pointer (although likely with a preceding NULL check, which the
>>>>>> branch
>>>>>> prediction would be happy to give us for free). At least as long
>>>>>> as sqrt
>>>>>> is not being reassigned, but that should hit the 99% case.
>>>>>>
>>>>>>> However, here:
>>>>>>>
>>>>>>> from numpy import sqrt
>>>>> Correction: "import numpy as np"
>>>>>>>
>>>>>>> cdef double f(double x):
>>>>>>> return np.sqrt(x * x) # or np.sin(x * x)
>>>>>>>
>>>>>>> the __getattr__ on np sure is larger than any effect we discuss.
>>>>>>
>>>>>> Yes, that would have to stay a .pxd case, I guess.
>>>>>
>>>>> How about this mini-CEP:
>>>>>
>>>>> Modules are allowed to specify __nomonkey__ (or __const__, or
>>>>> __notreassigned__), a list of strings naming module-level variables
>>>>> where
>>>>> "we don't hold you responsible if you assume no monkey-patching of
>>>>> these".
>>>>>
>>>>> When doing "import numpy as np", then (assuming "np" is never
>>>>> reassigned in
>>>>> the module), at import time we check all names looked up from it in
>>>>> __nomonkey__, and if so treat them as "from numpy import sqrt as
>>>>> 'np.sqrt'",
>>>>> i.e. the "np." is just a namespace mechanism.
>>>>
>>>> I like the idea. I think this could be generalized to a 'final'
>>>> keyword, that could also enable optimizations for cdef class
>>>> attributes. So you'd say
>>>>
>>>> cdef final object np
>>>> import numpy as np
>>>>
>>>> For class attributes this would tell the compiler that it will not be
>>>> rebound, which means you could check if attributes are initialized in
>>>> the initializer, or just pull such checks (as wel as bounds checks),
>>>> at least for memoryviews, out of loops, without worrying whether it
>>>> will be reassigned in the meantime.
>>>
>>> final is a nice way to describe this. If we were to introduce a new
>>> keyword, static might do as well.
>>>
>>> It seems more natural to do this in the numpy.pxd file (perhaps it
>>> could just be declared as a final object) and that would allow us to
>>> not worry about re-assignment. Cython could then try to keep that
>>> contract for any modules it compiles. (This is, however, a bit more
>>> restrictive, though one can always cimport and import modules under
>>> different names.)
>>
>> However, it's actually not the module that's "final" in this regard
>> but the
>> functions it exports - *they* do not change and neither do their C
>> signatures. So the "final" modifier should stick to the functions
>> (possibly
>> declared at the "cdef extern" line), which would then allow us to resolve
>> and cache the C function pointers at import time.
>
> Are there any advantages at getting this information at compile time
> rather than import time?
>
> If you got the full signature it would be a different matter (for type
> inference etc.); you could essentially do something like
>
> cdef final double sin(double)
> cdef final float sin(float)
> cdef final double cos(double)

In fact, "final" is sort of implied whenever a pxd is implied. The mere 
act of providing a pxd means you expect early binding to happen. So I 
think this boils down to simply allowing to resolve ABIs declared in pxd 
files through CEP 1000 instead of assuming it is a Cython module:

cdef double sin(double)
cdef double cos(double)

We could first look for the Cython ABI at import time, and if that isn't 
there, fall back to CEP 1000. And in time, deprecate the Cython ABI in 
favour of CEP 1000 (and follow-up CEPs to make it complete enough).

The __nomonkey__ was something else, a proposal about a pxd-less 
approach. We can do both.

Dag

>
> ...and you would know types at compile-time, and get pointers for those
> at import time.
>
>>
>> That mimics the case of the current "final" classes and methods, where we
>> take off the method pointers at compile time. And inside of numpy.pxd is
>> the perfect place to declare this, not as part of the import.
>
> However,
>
> a) a __finals__ in the NumPy Python module is something the NumPy
> project can maintain, and which can be different on different releases
> etc. (OK, NumPy is special because it is so high profile, but any other
> library)
>
> b) a __finals__ is something PyPy, Numba, etc. could benefit from as well
>
> Of course, one doesn't exclude the other. And if a library implements
> CEP1000 + provides __finals__, it would be trivial to run a pxd
> generator on it.
>
> Dag



More information about the cython-devel mailing list