[pypy-dev] Differences performance Julia / PyPy on very similar codes

Thu Dec 24 01:06:43 EST 2020

On 23.12.20 14:42, PIERRE AUGIER wrote:
> I wrote another very simple benchmark that should not depend on auto-vectorization. The bench function is:
>
> ```python
> def sum_x(positions):
>      result = 0.0
>      for i in range(len(positions)):
>          result += positions[i].x
>      return result
> ```

This benchmark probably really shows the crux of the problem. In Python,
the various Points instances (whether with lists, or with direct
attributes) are vastly more complex beasts than the structs in Julia.
There, you can declare a struct with a certain number of Float64 fields
and be done. Thus, reading .x from such a struct is just a pointer
dereference.

In Python, due to dynamic typing, the ability to add more fields later
and even the ability to change the class of an instance, the actual
memory layout of a Point3D type is much more complex with various
indirections and boxing. Reading .x out of such a thing is done in
several steps:

1) check that positions[i] is an instance
2) check that it's an instance of Point3D
3) read its x field
4) check that the field is a float
5) read the float's value

All of these steps involve a pointer read.

Improving this situation is probably possible (there's even a paper how
to get rid of steps 1 and 2:
https://www.csl.cornell.edu/~cbatten/pdfs/cheng-type-freezing-cgo2020.pdf but
the work wasn't merged). But there are problems:
- basically every single one of these steps needs to be addressed, and
every one is its own optimization
- it's extremely delicate to get the balance and the trade-offs right,
because the object system is so central in getting good performance for
Python code across a wide variety of areas (not just numerical algorithms).

Another approach would indeed be (as you say in the other mail) to add
support for telling PyPy explicitly that some list can contain only
instances of a specific class and (more importantly) that a class is not
to be considered to be "dynamic" meaning that its fields are fixed and
of specific types. So far, we have not really gone in such directions,
because that is language design and we leave that to the CPython devs ;-).

Note that some of your other benchmarks are not measuring what you hope!
eg I suspect that get_objects, get_xs and loop_over_list_of_objects from
your other mail get completely removed by the Julia compiler, since they
don't have side effects and don't compute anything. PyPy isn't actually
able to remove empty loops. So you are comparing empty loops in PyPy
with no code at all in Julia.

Cheers,

Carl Friedrich