Freelist in PyPy? Reuse short lived objects?
Hello, I was still playing with the idea to speedup codes using small numerical objects. I wrote a Cython extension which defines a Point (3d) cdef class and a Points cdef class (a vector of points). Both classes contain a pointer towards a point_ C struct: ctypedef struct point_: float x, y, z Of course, any computation with Point objects with involved several very short lived objects and we really want to avoid all the associated malloc/free calls. In Cython, one can decorate a cdef class with `@cython.freelist(8)` to reused objects: https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#f... I try to add a bit of logic to avoid freeing and allocating the memory for the struct (https://github.com/paugier/nbabel/blob/master/py/microbench/util_cython.pyx). If I understand correctly, doing such things is possible in CPython because the method __dealloc__ is called as soon as the objects is not accessible from Python. Or we can use the fact that it's very fast to get the reference count of an instance. But I think it is not the case for PyPy. Is there an alternative strategy efficient with PyPy? Pierre
Well if you write this in pure Python and run it via PyPy I imagine that most of the time the Point objects won't be created at all, as the JIT will detect that the are created and don't escape the scope of the JIT loop, so they can be ripped apart and stored in locals. But also these sort of optimizations make less sense in a GC environment where allocation is (almost) free, and the cost of freeing objects is lowered due to bulk reclaiming of objects. I'd try writing some tests in pure Python, running PyPy with jit tracing and see what it spits out in the log. On Thu, Jan 14, 2021 at 8:34 AM PIERRE AUGIER <pierre.augier@univ-grenoble-alpes.fr> wrote:
Hello,
I was still playing with the idea to speedup codes using small numerical objects.
I wrote a Cython extension which defines a Point (3d) cdef class and a Points cdef class (a vector of points). Both classes contain a pointer towards a point_ C struct:
ctypedef struct point_: float x, y, z
Of course, any computation with Point objects with involved several very short lived objects and we really want to avoid all the associated malloc/free calls.
In Cython, one can decorate a cdef class with `@cython.freelist(8)` to reused objects: https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#f...
I try to add a bit of logic to avoid freeing and allocating the memory for the struct (https://github.com/paugier/nbabel/blob/master/py/microbench/util_cython.pyx). If I understand correctly, doing such things is possible in CPython because the method __dealloc__ is called as soon as the objects is not accessible from Python. Or we can use the fact that it's very fast to get the reference count of an instance. But I think it is not the case for PyPy.
Is there an alternative strategy efficient with PyPy?
Pierre _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
-- “One of the main causes of the fall of the Roman Empire was that–lacking zero–they had no way to indicate successful termination of their C programs.” (Robert Firth)
Hi Pierre, This is not ready at all and I don't have enough time to work on it at the moment, *however*: I have a small prototype (on the branch map-improvements) that changes the instance layout in PyPy to store type-stable instances with several fields that contain ints or floats much more efficiently. It seems to give a 50% speedup on your micro benchmark, so that's promising. There's still a bug somewhere and it needs very careful investigation whether it costs too much on non-numerical programs, but potentially this is a good improvement. Cython is not likely to help on PyPy, because the overhead of our C-API emulation is too high. A free list is also unfortunately not really workable for us, since our GC strategy is very different (we don't know when an object is freed). Cheers, Carl Friedrich On January 14, 2021 4:34:12 PM GMT+01:00, PIERRE AUGIER <pierre.augier@univ-grenoble-alpes.fr> wrote:
Hello,
I was still playing with the idea to speedup codes using small numerical objects.
I wrote a Cython extension which defines a Point (3d) cdef class and a Points cdef class (a vector of points). Both classes contain a pointer towards a point_ C struct:
ctypedef struct point_: float x, y, z
Of course, any computation with Point objects with involved several very short lived objects and we really want to avoid all the associated malloc/free calls.
In Cython, one can decorate a cdef class with `@cython.freelist(8)` to reused objects: https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#f...
I try to add a bit of logic to avoid freeing and allocating the memory for the struct (https://github.com/paugier/nbabel/blob/master/py/microbench/util_cython.pyx). If I understand correctly, doing such things is possible in CPython because the method __dealloc__ is called as soon as the objects is not accessible from Python. Or we can use the fact that it's very fast to get the reference count of an instance. But I think it is not the case for PyPy.
Is there an alternative strategy efficient with PyPy?
Pierre _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
On 15.01.21 07:44, Carl Friedrich Bolz-Tereick wrote:
This is not ready at all and I don't have enough time to work on it at the moment, *however*: I have a small prototype (on the branch map-improvements) that changes the instance layout in PyPy to store type-stable instances with several fields that contain ints or floats much more efficiently. It seems to give a 50% speedup on your micro benchmark, so that's promising. There's still a bug somewhere and it needs very careful investigation whether it costs too much on non-numerical programs, but potentially this is a good improvement.
Seems to be even more like a 90% improvement on your microbench (from 13.0 to 7.0). I also fixed the bug. Some more work is needed, but it looks relatively promising at this point. Cheers, CF
----- Mail original -----
De: "Carl Friedrich Bolz-Tereick" <cfbolz@gmx.de> À: "pypy-dev" <pypy-dev@python.org>, "PIERRE AUGIER" <pierre.augier@univ-grenoble-alpes.fr>, "pypy-dev" <pypy-dev@python.org> Envoyé: Mercredi 20 Janvier 2021 12:33:43 Objet: Re: [pypy-dev] Freelist in PyPy? Reuse short lived objects?
On 15.01.21 07:44, Carl Friedrich Bolz-Tereick wrote:
This is not ready at all and I don't have enough time to work on it at the moment, *however*: I have a small prototype (on the branch map-improvements) that changes the instance layout in PyPy to store type-stable instances with several fields that contain ints or floats much more efficiently. It seems to give a 50% speedup on your micro benchmark, so that's promising. There's still a bug somewhere and it needs very careful investigation whether it costs too much on non-numerical programs, but potentially this is a good improvement.
Seems to be even more like a 90% improvement on your microbench (from 13.0 to 7.0). I also fixed the bug. Some more work is needed, but it looks relatively promising at this point.
Yes, it looks really promising! It could bring the pure Python implementation much closer to the C and Fortran implementations used for the benchmark in the Nature Astro paper (Zwart, 2020). Note that I'm doing some power usage measurements with serious hardware so I'll soon be able to reproduce and extend the figure shown here: https://github.com/paugier/nbabel. It means that I will soon have everything to propose a serious reply to Zwart (2020). To try your version, I guess I need to compile PyPy? And I also guess that it's only for PyPy2 first? Pierre
On 1/21/21 2:42 PM, PIERRE AUGIER wrote:
----- Mail original -----
De: "Carl Friedrich Bolz-Tereick" <cfbolz@gmx.de> À: "pypy-dev" <pypy-dev@python.org>, "PIERRE AUGIER" <pierre.augier@univ-grenoble-alpes.fr>, "pypy-dev" <pypy-dev@python.org> Envoyé: Mercredi 20 Janvier 2021 12:33:43 Objet: Re: [pypy-dev] Freelist in PyPy? Reuse short lived objects?
On 15.01.21 07:44, Carl Friedrich Bolz-Tereick wrote:
This is not ready at all and I don't have enough time to work on it at the moment, *however*: I have a small prototype (on the branch map-improvements) that changes the instance layout in PyPy to store type-stable instances with several fields that contain ints or floats much more efficiently. It seems to give a 50% speedup on your micro benchmark, so that's promising. There's still a bug somewhere and it needs very careful investigation whether it costs too much on non-numerical programs, but potentially this is a good improvement.
Seems to be even more like a 90% improvement on your microbench (from 13.0 to 7.0). I also fixed the bug. Some more work is needed, but it looks relatively promising at this point.
Yes, it looks really promising! It could bring the pure Python implementation much closer to the C and Fortran implementations used for the benchmark in the Nature Astro paper (Zwart, 2020).
Note that I'm doing some power usage measurements with serious hardware so I'll soon be able to reproduce and extend the figure shown here: https://github.com/paugier/nbabel. It means that I will soon have everything to propose a serious reply to Zwart (2020).
Cool!
To try your version, I guess I need to compile PyPy? And I also guess that it's only for PyPy2 first?
if you tell me your platform I can ask the buildbots to make you a binary (pypy3 works too, should not be hard to merge). Cheers, CF
participants (3)
-
Carl Friedrich Bolz-Tereick
-
PIERRE AUGIER
-
Timothy Baldridge