[pypy-dev] Differences performance Julia / PyPy on very similar codes

Tue Dec 22 15:50:19 EST 2020

On 22.12.20 16:34, PIERRE AUGIER wrote:
> Here, it is really about what can be done with PyPy, nowadays and in future.

Hi Pierre,

A few somewhat random comments from me.

First note is that you shouldn't run two different implementations that
you are comparing (Point3D and Point4D in this case) within the same
process, since they can influence each other. If I run them in the same
process I get this:

Point3D: 11.426 ms
Point4D: 21.572 ms

in separate processes the latter speeds up:

Point4D: 13.136 ms

(but it doesn't become faster than Point4D, indeed because we don't have
any real SIMD support in the JIT.)

Next: some information about how to look at the generated code with
PyPy. What I do is look at the JIT IR (which is very close to machine
code, but one abstraction level above it). You get it like this:

PYPYLOG=jit-log-opt,jit-summary,jit-backend-counts:out pypy3
microbench_pypy4.py

This produces a file called 'out' with different sections. I usually
start by looking at the bottom, which shows how often each trace is
entered. This way, you can find the hottest trace:

[26f0c8566379] {jit-backend-counts
...
TargetToken(140179837690368):43692970
TargetToken(140179837690448):74923530
...
[26f0c8567905] jit-backend-counts}

Now I search for the address of the hottest trace to find its IR. The IR
shows traced Python bycodes interspersed with IR instructions (takes a
bit of time to get used to reading it, but it's not super hard).

Looking through that it's my opinion that the trace looks quite good.
There are many small inefficiencies (a bit too much pointer chasing, a
bit too much type checking everywhere, a few allocations that aren't
necessary), but no single thing missed optimization that could
immediately give a 5x speedup.

Which also follows my expectations of how I suspect a shootout between
Julia and PyPy to end up: PyPy is much faster than CPython for
algorithmic pure Python code (~150x on my laptop! that's really good
:-)). But it can't really beat a "serious" ahead-of-time compiler for a
statically typed language that specifically targets numerical code. That
is for several reasons, the most important ones being that 1) PyPy has a
lot less time to produce code given that it does it at runtime 2) PyPy
has to support the full dynamically typed language Python where really
random things can be done at runtime and PyPy must still always observe
the Python semantics.

That said, I can understand that 5x slower is still a somewhat
disappointing result and I suspect given enough effort we could maybe
get it down to around 3x slower.

Cheers,

Carl Friedrich