Mailman 3 CFFI: better performance when calling a function from address - pypy-dev

Armin Rigo

September 2018

7:47 p.m.

Hi Dimitri, On Wed, 26 Sep 2018 at 21:19, Dimitri Vorona via pypy-dev <pypy-dev@python.org> wrote:

In my microbenchmarks its has pretty much the same call performance as when using cffi ABI mode (dumping the functions to a shared library first) and is around 250ns per call slower than when using API mode.

I doubt that these microbenchmarks are relevant. But just in case, I found out that the JIT is producing two extra instructions in the ABI case, if you call ``lib.foobar()``. These two instructions are caused by reading the ``foobar`` method on the ``lib`` object. If you write instead ``foobar()``, with either ``foobar = lib.foobar`` or ``from _x_cffi.lib import foobar`` done earlier, then the speed is exactly the same. A bientôt, Armin.

Reply

Sign in to reply online Use email software

Armin Rigo

9:05 p.m.

Hi Carl Friedrich, On Wed, 26 Sep 2018 at 22:28, Carl Friedrich Bolz-Tereick <cfbolz@gmx.de> wrote:

...

Couldn't that slowness of getattr be fixed by making the lib objects eg use module dicts or something?

If we use the out-of-line API mode then ``lib`` is an RPython object, but if we use the in-line ABI mode then it's a pure Python object. More precisely it's a singleton instance of a newly created Python class, and the two extra instructions are reading and guard_value'ing the map. It might be possible to rewrite the whole section of pure-Python code making the ``lib`` for the in-line ABI mode, but that looks like it would be even slower on CPython. And I don't like the idea of duplicating---or even making any non-necessary changes to---this slightly-fragile-looking logic... Anyway, I'm not sure to understand how just a guard_value on the map of an object can cause a 250 ns slow-down. I'd rather have thought it would cause no measurable difference. Maybe I missed another difference. Maybe the effect is limited to microbenchmarks. Likely another mystery of modern CPUs. A bientôt, Armin.

Reply

Sign in to reply online Use email software

Armin Rigo

7:54 p.m.

Hi again, On Wed, 26 Sep 2018 at 21:19, Dimitri Vorona via pypy-dev <pypy-dev@python.org> wrote:

...

In my microbenchmarks its has pretty much the same call performance as when using cffi ABI mode (dumping the functions to a shared library first) and is around 250ns per call slower than when using API mode.

I haven't looked at the generating assembly yet, but I guess pypy has to be more careful, since not all information from API mode is available.

Just for completeness, the documentation of CFFI says indeed that the API mode is faster than the ABI mode. That's true mostly on CPython, where the ABI mode always requires using libffi for the calls, which is slow. On PyPy, the JIT has got enough information to do mostly the same thing in the end. (And before the JIT, PyPy uses libffi both for the ABI and the API mode for simplicity.) A bientôt, Armin.

Reply

Sign in to reply online Use email software

Armin Rigo

September 2018

7:47 p.m.

Hi Dimitri, On Wed, 26 Sep 2018 at 21:19, Dimitri Vorona via pypy-dev <pypy-dev@python.org> wrote:

...

In my microbenchmarks its has pretty much the same call performance as when using cffi ABI mode (dumping the functions to a shared library first) and is around 250ns per call slower than when using API mode.

I doubt that these microbenchmarks are relevant. But just in case, I found out that the JIT is producing two extra instructions in the ABI case, if you call ``lib.foobar()``. These two instructions are caused by reading the ``foobar`` method on the ``lib`` object. If you write instead ``foobar()``, with either ``foobar = lib.foobar`` or ``from _x_cffi.lib import foobar`` done earlier, then the speed is exactly the same. A bientôt, Armin.

Reply

Sign in to reply online Use email software

Armin Rigo

9:05 p.m.

Hi Carl Friedrich, On Wed, 26 Sep 2018 at 22:28, Carl Friedrich Bolz-Tereick <cfbolz@gmx.de> wrote:

...

Couldn't that slowness of getattr be fixed by making the lib objects eg use module dicts or something?

If we use the out-of-line API mode then ``lib`` is an RPython object, but if we use the in-line ABI mode then it's a pure Python object. More precisely it's a singleton instance of a newly created Python class, and the two extra instructions are reading and guard_value'ing the map. It might be possible to rewrite the whole section of pure-Python code making the ``lib`` for the in-line ABI mode, but that looks like it would be even slower on CPython. And I don't like the idea of duplicating---or even making any non-necessary changes to---this slightly-fragile-looking logic... Anyway, I'm not sure to understand how just a guard_value on the map of an object can cause a 250 ns slow-down. I'd rather have thought it would cause no measurable difference. Maybe I missed another difference. Maybe the effect is limited to microbenchmarks. Likely another mystery of modern CPUs. A bientôt, Armin.

Reply

Sign in to reply online Use email software

Armin Rigo

7:54 p.m.

Hi again, On Wed, 26 Sep 2018 at 21:19, Dimitri Vorona via pypy-dev <pypy-dev@python.org> wrote:

...

In my microbenchmarks its has pretty much the same call performance as when using cffi ABI mode (dumping the functions to a shared library first) and is around 250ns per call slower than when using API mode.

I haven't looked at the generating assembly yet, but I guess pypy has to be more careful, since not all information from API mode is available.

Just for completeness, the documentation of CFFI says indeed that the API mode is faster than the ABI mode. That's true mostly on CPython, where the ABI mode always requires using libffi for the calls, which is slow. On PyPy, the JIT has got enough information to do mostly the same thing in the end. (And before the JIT, PyPy uses libffi both for the ABI and the API mode for simplicity.) A bientôt, Armin.

Reply

Sign in to reply online Use email software

CFFI: better performance when calling a function from address

Dimitri Vorona

Armin Rigo

Carl Friedrich Bolz-Tereick

Armin Rigo

Dimitri Vorona

Armin Rigo

Armin Rigo

Carl Friedrich Bolz-Tereick

Armin Rigo

Dimitri Vorona

Armin Rigo

tags

participants (3)