[Numpy-discussion] Getting C-function pointers from Python to C

Tue Apr 10 09:15:51 EDT 2012

On 04/10/2012 03:10 PM, Dag Sverre Seljebotn wrote:
> On 04/10/2012 03:00 PM, Nathaniel Smith wrote:
>> On Tue, Apr 10, 2012 at 1:39 PM, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no>   wrote:
>>> On 04/10/2012 12:37 PM, Nathaniel Smith wrote:
>>>> On Tue, Apr 10, 2012 at 1:57 AM, Travis Oliphant<travis at continuum.io>     wrote:
>>>>> On Apr 9, 2012, at 7:21 PM, Nathaniel Smith wrote:
>>>>>
>>>>> ...isn't this an operation that will be performed once per compiled
>>>>> function? Is the overhead of the easy, robust method (calling ctypes.cast)
>>>>> actually measurable as compared to, you know, running an optimizing
>>>>> compiler?
>>>>>
>>>>> Yes, there can be significant overhead.   The compiler is run once and
>>>>> creates the function.   This function is then potentially used many, many
>>>>> times.    Also, it is entirely conceivable that the "build" step happens at
>>>>> a separate "compilation" time, and Numba actually loads a pre-compiled
>>>>> version of the function from disk which it then uses at run-time.
>>>>>
>>>>> I have been playing with a version of this using scipy.integrate and
>>>>> unfortunately the overhead of ctypes.cast is rather significant --- to the
>>>>> point of making the code-path using these function pointers to be useless
>>>>> when without the ctypes.cast overhed the speed up is 3-5x.
>>>>
>>>> Ah, I was assuming that you'd do the cast once outside of the inner
>>>> loop (at the same time you did type compatibility checking and so
>>>> forth).
>>>>
>>>>> In general, I think NumPy will need its own simple function-pointer object
>>>>> to use when handing over raw-function pointers between Python and C.   SciPy
>>>>> can then re-use this object which also has a useful C-API for things like
>>>>> signature checking.    I have seen that ctypes is nice but very slow and
>>>>> without a compelling C-API.
>>>>
>>>> Sounds reasonable to me. Probably nicer than violating ctypes's
>>>> abstraction boundary, and with no real downsides.
>>>>
>>>>> The kind of new C-level cfuncptr object I imagine has attributes:
>>>>>
>>>>> void *func_ptr;
>>>>> char *signature string  /* something like 'dd->d' to indicate a function
>>>>> that takes two doubles and returns a double */
>>>>
>>>> This looks like it's setting us up for trouble later. We already have
>>>> a robust mechanism for describing types -- dtypes. We should use that
>>>> instead of inventing Yet Another baby type system. We'll need to
>>>> convert between this representation and dtypes anyway if you want to
>>>> use these pointers for ufunc loops... and if we just use dtypes from
>>>> the start, we'll avoid having to break the API the first time someone
>>>> wants to pass a struct or array or something.
>>>
>>> For some of the things we'd like to do with Cython down the line,
>>> something very fast like what Travis describes is exactly what we need;
>>> specifically, if you have Cython code like
>>>
>>> cdef double f(func):
>>>       return func(3.4)
>>>
>>> that may NOT be called in a loop.
>>>
>>> But I do agree that this sounds overkill for NumPy+numba at the moment;
>>> certainly for scipy.integrate where you can amortize over N function
>>> samples. But Travis perhaps has a usecase I didn't think of.
>>
>> It sounds sort of like you're disagreeing with me but I can't tell
>> about what, so maybe I was unclear :-).
>>
>> All I was saying was that a list-of-dtype-objects was probably a
>> better way to write down a function signature than some ad-hoc string
>> language. In both cases you'd do some type-compatibility-checking up
>> front and then use C calling afterwards, and I don't see why
>> type-checking would be faster or slower for one representation than
>> the other. (Certainly one wouldn't have to support all possible dtypes

Rereading this, perhaps this is the statement you seek: Yes, doing a 
simple strcmp is much, much faster than jumping all around in memory to 
check the equality of two lists of dtypes. If it is a string less than 8 
bytes in length with the comparison string known at compile-time (the 
Cython case) then the comparison is only a couple of CPU instructions, 
as you can check 64 bits at the time.

Dag

>> up front, the point is just that they give us more room to grow
>> later.)
>
> My point was that with Cython you'd get cases where there is no
> "up-front", you have to check-and-call as essentially one operation. The
> Cython code above would result in something like this:
>
> if (strcmp("dd->d", signature) == 0) {
>      /* guess on signature and have fast C dispatch for exact match */
> }
> else {
>      /* fall back to calling as Python object */
> }
>
> The strcmp would probably be inlined and unrolled, but you get the idea.
>
> With LLVM available, and if Cython started to use it, we could generate
> more such branches on the fly, making it more attractive.
>
> Dag
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion