[Numpy-discussion] Getting C-function pointers from Python to C

Tue Apr 10 09:49:40 EDT 2012

On 04/10/2012 03:38 PM, Dag Sverre Seljebotn wrote:
> On 04/10/2012 03:29 PM, Nathaniel Smith wrote:
>> On Tue, Apr 10, 2012 at 2:15 PM, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no>   wrote:
>>> On 04/10/2012 03:10 PM, Dag Sverre Seljebotn wrote:
>>>> On 04/10/2012 03:00 PM, Nathaniel Smith wrote:
>>>>> On Tue, Apr 10, 2012 at 1:39 PM, Dag Sverre Seljebotn
>>>>> <d.s.seljebotn at astro.uio.no>      wrote:
>>>>>> On 04/10/2012 12:37 PM, Nathaniel Smith wrote:
>>>>>>> On Tue, Apr 10, 2012 at 1:57 AM, Travis Oliphant<travis at continuum.io>        wrote:
>>>>>>>> On Apr 9, 2012, at 7:21 PM, Nathaniel Smith wrote:
>>>>>>>>
>>>>>>>> ...isn't this an operation that will be performed once per compiled
>>>>>>>> function? Is the overhead of the easy, robust method (calling ctypes.cast)
>>>>>>>> actually measurable as compared to, you know, running an optimizing
>>>>>>>> compiler?
>>>>>>>>
>>>>>>>> Yes, there can be significant overhead.   The compiler is run once and
>>>>>>>> creates the function.   This function is then potentially used many, many
>>>>>>>> times.    Also, it is entirely conceivable that the "build" step happens at
>>>>>>>> a separate "compilation" time, and Numba actually loads a pre-compiled
>>>>>>>> version of the function from disk which it then uses at run-time.
>>>>>>>>
>>>>>>>> I have been playing with a version of this using scipy.integrate and
>>>>>>>> unfortunately the overhead of ctypes.cast is rather significant --- to the
>>>>>>>> point of making the code-path using these function pointers to be useless
>>>>>>>> when without the ctypes.cast overhed the speed up is 3-5x.
>>>>>>>
>>>>>>> Ah, I was assuming that you'd do the cast once outside of the inner
>>>>>>> loop (at the same time you did type compatibility checking and so
>>>>>>> forth).
>>>>>>>
>>>>>>>> In general, I think NumPy will need its own simple function-pointer object
>>>>>>>> to use when handing over raw-function pointers between Python and C.   SciPy
>>>>>>>> can then re-use this object which also has a useful C-API for things like
>>>>>>>> signature checking.    I have seen that ctypes is nice but very slow and
>>>>>>>> without a compelling C-API.
>>>>>>>
>>>>>>> Sounds reasonable to me. Probably nicer than violating ctypes's
>>>>>>> abstraction boundary, and with no real downsides.
>>>>>>>
>>>>>>>> The kind of new C-level cfuncptr object I imagine has attributes:
>>>>>>>>
>>>>>>>> void *func_ptr;
>>>>>>>> char *signature string  /* something like 'dd->d' to indicate a function
>>>>>>>> that takes two doubles and returns a double */
>>>>>>>
>>>>>>> This looks like it's setting us up for trouble later. We already have
>>>>>>> a robust mechanism for describing types -- dtypes. We should use that
>>>>>>> instead of inventing Yet Another baby type system. We'll need to
>>>>>>> convert between this representation and dtypes anyway if you want to
>>>>>>> use these pointers for ufunc loops... and if we just use dtypes from
>>>>>>> the start, we'll avoid having to break the API the first time someone
>>>>>>> wants to pass a struct or array or something.
>>>>>>
>>>>>> For some of the things we'd like to do with Cython down the line,
>>>>>> something very fast like what Travis describes is exactly what we need;
>>>>>> specifically, if you have Cython code like
>>>>>>
>>>>>> cdef double f(func):
>>>>>>         return func(3.4)
>>>>>>
>>>>>> that may NOT be called in a loop.
>>>>>>
>>>>>> But I do agree that this sounds overkill for NumPy+numba at the moment;
>>>>>> certainly for scipy.integrate where you can amortize over N function
>>>>>> samples. But Travis perhaps has a usecase I didn't think of.
>>>>>
>>>>> It sounds sort of like you're disagreeing with me but I can't tell
>>>>> about what, so maybe I was unclear :-).
>>>>>
>>>>> All I was saying was that a list-of-dtype-objects was probably a
>>>>> better way to write down a function signature than some ad-hoc string
>>>>> language. In both cases you'd do some type-compatibility-checking up
>>>>> front and then use C calling afterwards, and I don't see why
>>>>> type-checking would be faster or slower for one representation than
>>>>> the other. (Certainly one wouldn't have to support all possible dtypes
>>>
>>> Rereading this, perhaps this is the statement you seek: Yes, doing a
>>> simple strcmp is much, much faster than jumping all around in memory to
>>> check the equality of two lists of dtypes. If it is a string less than 8
>>> bytes in length with the comparison string known at compile-time (the
>>> Cython case) then the comparison is only a couple of CPU instructions,
>>> as you can check 64 bits at the time.
>>
>> Right, that's what I wasn't getting until you mentioned strcmp :-).
>>
>> That said, the core numpy dtypes are singletons. For this purpose, the
>> signature could be stored as C array of PyArray_Descr*, but even if we
>> store it in a Python tuple/list, we'd still end up with a contiguous
>> array of PyArray_Descr*'s. (I'm assuming that we would guarantee that
>> it was always-and-only a real PyTupleObject* here.) So for the
>> function we're talking about, the check would compile down to doing
>> the equivalent of a 3*pointersize-byte strcmp, instead of a 5-byte
>> strcmp. That's admittedly worse, but I think the difference between
>> these two comparisons is unlikely to be measurable, considering that
>> they're followed immediately by a cache miss when we actually jump to
>> the function pointer.

Actually, I think the performance hit is a problem in the Cython case. 
While there's no place to explicitly pre-check the signature, it will 
very often be the case that everything is in L1 cache already. Consider 
f being called in a loop. (And the whole point of the exercise is to 
avoid for the user having to type the "func" argument)

Dag