[pypy-dev] Unicode encode/decode speed

Tue Feb 12 19:14:13 CET 2013

On 12/02/13 11:04, Maciej Fijalkowski wrote:
>
> I would like to see some evidence about it. Did you try valgrind?
>
> Cheers,
> fijal
>

Even better, we wanted to find a way for you to be able to test it by 
yourselves, so we tried to create a representative synthetic benchmark.

Surprisingly when we retested the benchmark that we had previously 
posted here in this mailing list, we found that the performance profile 
is very similar to the one slow query that i've talked about in my 
recent emails.

To make it easier i'll repeat the freshened instructions (from the old 
email) of how to run that benchmark. Also attached is the updated (and 
heavily optimized) MSPW:

--repost--

To run it you'll need latest madIS. You can clone it using:

hg clone https://code.google.com/p/madis/

For running the test with CPython you'll need:

CPython 2.7 + APSW:

https://code.google.com/p/apsw/

For PyPy you'll need MPSW renamed to "apsw.py" (the attached MPSW is 
already renamed to "apsw.py").
Move "apsw.py" to pypy's "site-packages" directory. For MSPW to work in 
PyPy, you'll also need CFFI and "libsqlite3" installed.

To run the test with PyPy:

pypy mterm.py < mspw_bench.sql

or with CPython

python mterm.py < mspw_bench.sql

The timings of "mspw_bench" that we get are:

CPython 2.7 + APSW: ~ 2.6sec
PyPy + MSPW: ~ 4sec

There are two ways to adjust the processing load of mspw_bench.

One is to change the value in "range(xxxxx)". This will in essence 
create a bigger/smaller "synthetic text". This puts more pressure on 
CPython's/pypy's side.

The other way is to adjust the window size of textwindow(t, xx, xx). 
This puts more pressure on the wrapper (APSW/MSPW)  because it changes 
the number of columns that CPython/PyPy have to send to SQLite (they are 
send one value at a time).

--/repost--

Attached you'll find our latest MSPW (renamed to "apsw.py") and 
"mspw_bench.sql"

Also we are looking into adding a special ffi.string_decode_UTF8 in 
CFFI's backend to reduce the number of calls that are needed to go from 
utf8_char* to PyPy's unicode.

Do you thing that such an addition would be worthwhile?

Thank you,

lefteris

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mspw_bench.sql
Type: text/x-sql
Size: 124 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20130212/db8af507/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: apsw.py
Type: text/x-python
Size: 67124 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20130212/db8af507/attachment-0001.py>