[pypy-dev] Differences performance Julia / PyPy on very similar codes

Bengt Richter bokr at bokr.com
Thu Dec 24 04:52:48 EST 2020

Hi Carl,

On +2020-12-24 07:06:43 +0100, Carl Friedrich Bolz-Tereick wrote:
> On 23.12.20 14:42, PIERRE AUGIER wrote:
> > I wrote another very simple benchmark that should not depend on auto-vectorization. The bench function is:
> > 
> > ```python
> > def sum_x(positions):
> >      result = 0.0
> >      for i in range(len(positions)):
> >          result += positions[i].x
> >      return result
> > ```
> This benchmark probably really shows the crux of the problem. In Python,
> the various Points instances (whether with lists, or with direct
> attributes) are vastly more complex beasts than the structs in Julia.
> There, you can declare a struct with a certain number of Float64 fields
> and be done. Thus, reading .x from such a struct is just a pointer
> dereference.
> In Python, due to dynamic typing, the ability to add more fields later
> and even the ability to change the class of an instance, the actual
> memory layout of a Point3D type is much more complex with various
> indirections and boxing. Reading .x out of such a thing is done in
> several steps:
> 1) check that positions[i] is an instance
> 2) check that it's an instance of Point3D
> 3) read its x field
> 4) check that the field is a float
> 5) read the float's value
> All of these steps involve a pointer read.

Could crafted asserts tell the compiler that dynamic stuff is not happening?
Is the compiler listening? :)

> Improving this situation is probably possible (there's even a paper how
> to get rid of steps 1 and 2:
> https://www.csl.cornell.edu/~cbatten/pdfs/cheng-type-freezing-cgo2020.pdf but
> the work wasn't merged). But there are problems:
> - basically every single one of these steps needs to be addressed, and
> every one is its own optimization
> - it's extremely delicate to get the balance and the trade-offs right,
> because the object system is so central in getting good performance for
> Python code across a wide variety of areas (not just numerical algorithms).
> Another approach would indeed be (as you say in the other mail) to add
> support for telling PyPy explicitly that some list can contain only
> instances of a specific class and (more importantly) that a class is not
> to be considered to be "dynamic" meaning that its fields are fixed and
> of specific types. So far, we have not really gone in such directions,
> because that is language design and we leave that to the CPython devs ;-).
assert is already there :)

> Note that some of your other benchmarks are not measuring what you hope!
> eg I suspect that get_objects, get_xs and loop_over_list_of_objects from
> your other mail get completely removed by the Julia compiler, since they
> don't have side effects and don't compute anything. PyPy isn't actually
> able to remove empty loops. So you are comparing empty loops in PyPy
> with no code at all in Julia.

And benchmarks that don't segregate outliers, or clusters, and just average
are terrible. Scattergrams are much more informative :)

E.g., in a scattergram you can discover that a CPU shortcutting multiply by zero
affects a subset of your computations.

> Cheers,
> Carl Friedrich
> _______________________________________________
> pypy-dev mailing list
> pypy-dev at python.org
> https://mail.python.org/mailman/listinfo/pypy-dev

Happy Holidays!

Bengt Richter

More information about the pypy-dev mailing list