On Thu, Dec 19, 2013 at 2:07 AM, Maciej Fijalkowski <fijall@gmail.com> wrote:
On Thu, Dec 19, 2013 at 3:17 AM, Gregory P. Smith <greg@krypto.org> wrote:
>
>
>
> On Tue, Dec 17, 2013 at 8:43 AM, Stefan Krah <stefan@bytereef.org> wrote:
>>
>> Maciej Fijalkowski <fijall@gmail.com> wrote:
>> > I would like to discuss on the language summit a potential inclusion
>> > of cffi[1] into stdlib. This is a project Armin Rigo has been working
>> > for a while, with some input from other developers.
>>
>> I've tried cffi (admittedly only in a toy script) and find it very nice
>> to use.
>>
>> Here's a comparison (pi benchmark) between wrapping libmpdec using a
>> C-extension (_decimal), cffi and ctypes:
>>
>>
>> +-------------------------------+----------+----------+---------+
>> |                               | _decimal |  ctypes  |   cffi  |
>> +===============================+==========+==========+=========+
>> | cpython-tip (with-system-ffi) |   0.19s  |   5.40s  |  5.14s  |
>> +-------------------------------+----------+----------+---------+
>> | cpython-2.7 (with-system-ffi) |    n/a   |   4.46s  |  5.18s  |
>> +-------------------------------+----------+----------+---------+
>> |      Ubuntu-cpython-2.7       |    n/a   |   3.63s  |    -    |
>> +-------------------------------+----------+----------+---------+
>> |      pypy-2.2.1-linux64       |    n/a   |  125.9s  |  0.94s  |
>> +-------------------------------+----------+----------+---------+
>> |     pypy3-2.1-beta1-linux64   |    n/a   |  264.9s  |  2.93s  |
>> +-------------------------------+----------+----------+---------+
>>
>>
>> I guess the key points are that C-extensions are hard to beat and that
>> cffi performance on pypy-2 is outstanding. Additionally it's worth noting
>> that Ubuntu does something in their Python build that we should do, too.
>
>
> Ubuntu compiles their Python with FDO (feedback directed optimization /
> profile guided optimization) enabled. All distros should do this if they
> don't already. It's generally 20% interpreter speedup. Our makefile already
> supports it but it isn't the default build as it takes a long time given
> that it needs to compile everything twice and do a profiled benchmark run
> between compilations.
>
> -gps

Hey Greg.

We found out that this only speedups benchmarks that you tried during
profiling and not others, so we disabled it for the default pypy
build. Can you provide me with some more detailed study on how it
speeds up interpreters in general and CPython in particular?

That's a common concern for profile based builds but it turns out not to matter a whole lot which workloads you choose for the CPython interpreter to collect profiles for a FDO build. I believe ubuntu's packages just use the test suite. In our own tests at work this produced good results. Interpreter loops and other common code paths in the interpreter have a *lot* of low hanging fruit in terms of more optimal code generation.

Link time optimization adds additional benefits IF you can get it working (not always easy or reliable right now as Matthias mentions in issue17781).

-gps