[pypy-dev] ctypes rawffi and ffi
estama at gmail.com
Tue Dec 13 12:39:53 CET 2011
First of all many many thanks Antonio for all this information. It helps
me a lot. For example for some time i was struggling trying to find how
to do callbacks in ffi :-) .
Concerning ctypes. Since my last email i found two points where ctypes
can be easily made faster. The first is all the "zip" functions that
happen inside ctypes. By changing them to itertools.izip, a small speed
up can be had (sorry for not having more concrete speed up numbers but i
didn't write down the speed differences).
The second is in _ctypes.basics: I think that the calculation
"sys.maxint * 2 + 1" in cdata_from_address is recalculated again and
again. By precalculating this calculation a sizable speed up can also be
Nevertheless, the largest speed up (~7x-10x) i managed to do, was when i
mostly bypassed ctypes and used rawffi and ffi directly.
Pypy's regular sqlite3 module runs attached (sqlitepypy.py) test in 7-10
secs (with the speed up ctypes). Also, i think that the variability of
the times is due to GC.
Changing sqlitepypy.py to use (attached) msqlite3 instead of sqlite3
module, the same test needs 1.5 sec.
Regular CPython needs ~500msec .
I would be glad if you took a look at the changed msqlite3 (see
Connection.create_function), to comment about the changes.
I have to say that i like rawffi and ffi a lot more than ctypes. ffi's
simplicity especially is very welcoming.
Ctypes seem to me to be very overengineered. I would very much prefer an
API with which i could simply acquire the value of a C type, or directly
dereference a pointer (it took me some time to find that rawffi.Arrays
are ideal for this), than all this "wrapping" around that happens with
Also i would prefer an API where calls and callbacks aren't wrapped. I
had to do a hack in msqlite3 to disable ctype's callback wrapping.
After all of the above tests, i believe that right now it isn't possible
to achieve the same callback speed as regular CPython with pypy's
current infrastructure. So when will the ffistruct branch be integrated
into pypy? I would like to run some tests on it :-) .
On 13/12/11 12:10, Antonio Cuni wrote:
> Hello Elefterios,
> On 12/11/2011 09:28 PM, Elefterios Stamatogiannakis wrote:
>> I'm exploring pypy's code so as to speed up callbacks from C code, so as to
>> speed up sqlite module's UDF.
>> I have some questions:
>> - What are the differences between ctypes, rawffi and ffi. Where should each
>> one of them be used?
> _rawffi and _ffi are pypy-specific modules which expose the functionalities of
> libffi. _rawffi is old and slow, while _ffi is new and designed to be JIT
> friendly. However, at the moment of writing not all features of _rawffi have
> been ported to _ffi yet, that's why we need to keep both around.
> ctypes is implemented on top of _rawffi/_ffi. The plan for the future is to
> kill _rawffi at some point.
>> - I see that ctypes is build on top of rawffi and ffi. If one wishes to work
>> around ctypes (so as to not have ctype's overhead) which of the rawffi or ffi
>> should he use? Which of the two is faster at runtime?
> if possible, you should use _ffi. Note that so far with _ffi you can only call
> functions, but e.g. you cannot define a callback. If you are interested in
> this stuff, you might want to look at the ffistruct branch, which adds support
> for jit-friendly structures to _ffi.
> Note that the public interface of _ffi is still fluid, it might change in the
> future. E.g., right now pointers are represented just by using python longs,
> but we might want to use a proper well-typed wrapper in the future.
>> - How can i create a null pointer with _ffi?
> As I said above, right now pointers are passed around as Python longs, so you
> can just use 0 for the null pointer.
>> And some remarks:
>> By only modifying pypy's sqlite module code, i managed to speed up sqlite's
>> callbacks by 30% (for example there is a "for i in range(nargs)" line in
>> _sqlite3. _convert_params, which is a hot path).
> that's nice. Patches are welcome :-)
>> Also the following line in _ctypes/function.py ._wrap_callable
>> args = [argtype._CData_retval(argtype.from_address(arg)._buffer)
>> for argtype, arg in zip(argtypes, args)]
>> Occupies a large percentage of the overall callback time (around 60-70%).
> yes, I think that we never looked at performance of ctypes callback. Good spot
> In other parts of ctypes there are hacks and shortcuts for performances. E.g.,
> in _wrap_result we check whether the result type is primitive, and in that
> case we just avoid to call _CData_retval. Maybe it's possible to introduce a
> similar shortcut there.
>> Assuming that pypy JITs all of the above callback code. Is it a problem having
>> all these memory allocations for each callback (my test does 10M callbacks)?
>> Is there a way to avoid as much as possible all these memory allocations.
>> Right now CPython runs my test (10M callbacks) in 1.2 sec and pypy needs from
>> 9 to 14 secs. I suspect that the large spread of pypy's run times are due to GC.
> I think it's "simply" because we never optimized callbacks. When I ported
> ctypes from _rawffi to _ffi I got speedups up to 100 times faster. In case of
> callbacks I expect a minor gain, because the JIT cannot inline across them,
> but I still think there is room for lots of improvements.
> If you are interested in trying it, I'll be more than glad to help you :)
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 10527 bytes
Desc: not available
More information about the pypy-dev