[pypy-dev] ctypes rawffi and ffi

Tue Dec 13 12:39:53 CET 2011

First of all many many thanks Antonio for all this information. It helps 
me a lot. For example for some time i was struggling trying to find how 
to do callbacks in ffi :-) .

Concerning ctypes. Since my last email i found two points where ctypes 
can be easily made faster. The first is all the "zip" functions that 
happen inside ctypes. By changing them to itertools.izip, a small speed 
up can be had (sorry for not having more concrete speed up numbers but i 
didn't write down the speed differences).

The second is in _ctypes.basics: I think that the calculation 
"sys.maxint * 2 + 1" in cdata_from_address is recalculated again and 
again. By precalculating this calculation a sizable speed up can also be 
had.

Nevertheless, the largest speed up (~7x-10x) i managed to do, was when i 
mostly bypassed ctypes and used rawffi and ffi directly.

Pypy's regular sqlite3 module runs attached (sqlitepypy.py) test in 7-10 
secs (with the speed up ctypes). Also, i think that the variability of 
the times is due to GC.

Changing sqlitepypy.py to use (attached) msqlite3 instead of sqlite3 
module, the same test needs 1.5 sec.

Regular CPython needs ~500msec .

I would be glad if you took a look at the changed msqlite3 (see 
Connection.create_function), to comment about the changes.

I have to say that i like rawffi and ffi a lot more than ctypes. ffi's 
simplicity especially is very welcoming.

Ctypes seem to me to be very overengineered. I would very much prefer an 
API with which i could simply acquire the value of a C type, or directly 
dereference a pointer (it took me some time to find that rawffi.Arrays 
are ideal for this), than all this "wrapping"  around that happens with 
ctypes.

Also i would prefer an API where calls and callbacks aren't wrapped. I 
had to do a hack in msqlite3 to disable ctype's callback wrapping.

After all of the above tests, i believe that right now it isn't possible 
to achieve the same callback speed as regular CPython with pypy's 
current infrastructure. So when will the ffistruct branch be integrated 
into pypy? I would like to run some tests on it :-) .

l.

On 13/12/11 12:10, Antonio Cuni wrote:
> Hello Elefterios,
>
> On 12/11/2011 09:28 PM, Elefterios Stamatogiannakis wrote:
>> I'm exploring pypy's code so as to speed up callbacks from C code, so as to
>> speed up sqlite module's UDF.
>>
>> I have some questions:
>>
>>   - What are the differences between ctypes, rawffi and ffi. Where should each
>> one of them be used?
>
> _rawffi and _ffi are pypy-specific modules which expose the functionalities of
> libffi.  _rawffi is old and slow, while _ffi is new and designed to be JIT
> friendly.  However, at the moment of writing not all features of _rawffi have
> been ported to _ffi yet, that's why we need to keep both around.
>
> ctypes is implemented on top of _rawffi/_ffi.  The plan for the future is to
> kill _rawffi at some point.
>
>> - I see that ctypes is build on top of rawffi and ffi. If one wishes to work
>> around ctypes (so as to not have ctype's overhead) which of the rawffi or ffi
>> should he use? Which of the two is faster at runtime?
>
> if possible, you should use _ffi. Note that so far with _ffi you can only call
> functions, but e.g. you cannot define a callback.  If you are interested in
> this stuff, you might want to look at the ffistruct branch, which adds support
> for jit-friendly structures to _ffi.
>
> Note that the public interface of _ffi is still fluid, it might change in the
> future.  E.g., right now pointers are represented just by using python longs,
> but we might want to use a proper well-typed wrapper in the future.
>
>>   - How can i create a null pointer with _ffi?
>
> As I said above, right now pointers are passed around as Python longs, so you
> can just use 0 for the null pointer.
>
>> And some remarks:
>>
>> By only modifying pypy's sqlite module code, i managed to speed up sqlite's
>> callbacks by 30% (for example there is a "for i in range(nargs)" line in
>> _sqlite3. _convert_params, which is a hot path).
>
> that's nice. Patches are welcome :-)
>
>> Also the following line in _ctypes/function.py ._wrap_callable
>>
>> args = [argtype._CData_retval(argtype.from_address(arg)._buffer)
>>                          for argtype, arg in zip(argtypes, args)]
>>
>> Occupies a large percentage of the overall callback time (around 60-70%).
>
> yes, I think that we never looked at performance of ctypes callback. Good spot
> :-).
>
> In other parts of ctypes there are hacks and shortcuts for performances. E.g.,
> in _wrap_result we check whether the result type is primitive, and in that
> case we just avoid to call _CData_retval.  Maybe it's possible to introduce a
> similar shortcut there.
>
>> Assuming that pypy JITs all of the above callback code. Is it a problem having
>> all these memory allocations for each callback (my test does 10M callbacks)?
>> Is there a way to avoid as much as possible all these memory allocations.
>>
>> Right now CPython runs my test (10M callbacks) in 1.2 sec and pypy needs from
>> 9 to 14 secs. I suspect that the large spread of pypy's run times are due to GC.
>
> I think it's "simply" because we never optimized callbacks. When I ported
> ctypes from _rawffi to _ffi I got speedups up to 100 times faster. In case of
> callbacks I expect a minor gain, because the JIT cannot inline across them,
> but I still think there is room for lots of improvements.
>
> If you are interested in trying it, I'll be more than glad to help you :)
>
> ciao,
> Anto

-------------- next part --------------
A non-text attachment was scrubbed...
Name: sqlitetest.zip
Type: application/zip
Size: 10527 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20111213/5553cd9d/attachment.zip>