New Python/PyPy extension for object oriented numerically intensive codes ?
Hello, I wish you a Happy New Year. I would be very interested in using PyPy for numerically intensive scientific codes. I conclude from my experiments that PyPy could potentially be a great tool but that it is strongly limited in this area by the lack of some features in Python. For example: - Standard Python classes and standard Python instances are too dynamic for what we need for high performance. There should be a way to define in Python less dynamic classes and instances. - There is no specialized container to gather a fixed number of homogeneous objects and when possible to store them efficiently as native arrays of native variables. - There is no way to locally disable type and bound checks. I guess we don't have that in Python because it wouldn't be very useful for CPython and because Python has not been designed for performance. However, with efficient interpreters, I think many users would be happy to trade a bit of dynamism for better performance. I thought about what was missing and it seems to me that it could be provided by a Python/PyPy extension without addition to the Python language. However, Numpy API is IMHO not adapted. Now that Python is (and will be more and more) compared to Julia, I think it becomes necessary to have a good tool to write efficient numerical codes in pure Python style. I present here https://github.com/paugier/nbabel/blob/master/py/vector.md a possible new extension providing what I think would be needed to express in OOP Python things reasonable in terms of high performance computing. The detail of the proposed API is of course not very interesting at this point (it is just a dream). I am more interested about the point of view of PyPy developers and PyPy users about (i) the principle of this project (a Python extension to express in OOP Python things easier to be accelerated than standard Python) and (ii) about the technical feasibility of this project: Is it technically possible to extend Python and PyPy to develop such extension and make it very efficient? Which tools should be used? How should it be written? Best regards, Pierre
On Tue, 5 Jan 2021, PIERRE AUGIER wrote:
I thought about what was missing and it seems to me that it could be provided by a Python/PyPy extension without addition to the Python language. However, Numpy API is IMHO not adapted. Now that Python is (and will be more and more) compared to Julia, I think it becomes necessary to have a good tool to write efficient numerical codes in pure Python style.
Hi Pierre, I assume that you've had a detailed look at Cython, hand't you? All three points that you've listed are solved there in one way or another. Of course, it comes with its own set of tradeoffs / disadvantages, but in my scientific life I can't say I was really constrained by them, because Cython blends with Python so naturally (per module and per function), so I was actually always starting with pure Python and then going down up to the level of SIMD assembly for 1% of the code where it actually mattered (99% time was spent)... plus the whole MPI story for scaling. I'm afraid the situation is simply so good that there is too little motivation to solve this in Python itself :-/ and solving it in Python has its own problems. I guess first one really needs to find cases, when solving it in Python is rationally mandated to gather enough momentum. -- Sincerely yours, Yury V. Zaytsev
----- Mail original -----
De: "Yury V. Zaytsev" <yury@shurup.com> À: "PIERRE AUGIER" <pierre.augier@univ-grenoble-alpes.fr> Cc: "pypy-dev" <pypy-dev@python.org> Envoyé: Mardi 5 Janvier 2021 15:06:02 Objet: Re: [pypy-dev] New Python/PyPy extension for object oriented numerically intensive codes ?
On Tue, 5 Jan 2021, PIERRE AUGIER wrote:
I thought about what was missing and it seems to me that it could be provided by a Python/PyPy extension without addition to the Python language. However, Numpy API is IMHO not adapted. Now that Python is (and will be more and more) compared to Julia, I think it becomes necessary to have a good tool to write efficient numerical codes in pure Python style.
Hi Pierre,
I assume that you've had a detailed look at Cython, hand't you? All three points that you've listed are solved there in one way or another.
Of course, it comes with its own set of tradeoffs / disadvantages, but in my scientific life I can't say I was really constrained by them, because Cython blends with Python so naturally (per module and per function), so I was actually always starting with pure Python and then going down up to the level of SIMD assembly for 1% of the code where it actually mattered (99% time was spent)... plus the whole MPI story for scaling.
I used quite a lot Cython some years ago. I'm actually pretty happy that we don't use it anymore in Fluiddyn packages :-) For ahead-of-time compilation of Python, Transonic-Pythran is in my opinion nicer to use and I usually get more efficient results with nicer codes than with the C-like big Cython extensions that we used to have.
I'm afraid the situation is simply so good that there is too little motivation to solve this in Python itself :-/ and solving it in Python has its own problems. I guess first one really needs to find cases, when solving it in Python is rationally mandated to gather enough momentum.
A big issue IMHO with Cython is that Cython code is not compatible with Python and can't be interpreted. So we lose the advantage of an interpreted language in term of development. One small change in this big extension and one needs to recompile everything. For me, debugging is really harder (maybe because I'm not good at debugging native codes). Moreover, actually one needs to know (a bit of) C to write efficient Cython code so that it's difficult for some contributors to understand/develop Cython extensions. Therefore, I'm convinced that the situation is not so good (see also https://fluiddyn.netlify.app/transonic-vision.html). It's also interesting to compare what can be done in Python (and Cython) and in Julia in terms of scientific computing. Again, it would be very useful in the long term to be able to write more efficient codes in "simple" interpreted Python.
-- Sincerely yours, Yury V. Zaytsev
I played a bit with cffi which is not so far from what would be needed to develop the extension that I'd like to have. For example, this is quite efficient: from cffi import FFI ffi = FFI() ffi.cdef( """ typedef struct { double x, y, z; } point_t; """ ) def sum_x(vec): s = 0.0 for elem in vec: s += elem.x return s points = ffi.new("point_t[]", 1000) In [5]: %timeit sum_x(points) 1.34 µs ± 0.693 ns per loop compared to in Julia $ julia microbench_sum_x.jl sum_x(positions) 1.031 μs (1 allocation: 16 bytes) However, `points` is an instance of `_cffi_backend._CDataBase` (as `points[0]`) and it's not possible to "add methods to these objects". As soon as I hide the _cffi_backend._CDataBase points[0] objects in Python objects, it becomes much much slower. This makes me think again that PyPy would really need a nice extension to write Python a bit less dynamic than standard Python but more efficient. So my questions are: Is it technically possible to extend Python and PyPy to develop such extension and make it very efficient? Which tools should be used? How should it be written? Pierre
On Wed, 6 Jan 2021, PIERRE AUGIER wrote:
A big issue IMHO with Cython is that Cython code is not compatible with Python and can't be interpreted. So we lose the advantage of an interpreted language in term of development. One small change in this big extension and one needs to recompile everything.
That's a valid point to a certain extent - however, in my experience, I was always somehow able to extract individual small functions in mini-modules and then I wrote some Makefile / setuptools glue to automate chained recompilation of all parts that changed, whenever I ran unit tests or command line interface so recompilation kept annoying me only until I've got the magic to work :-)
For me, debugging is really harder (maybe because I'm not good at debugging native codes). Moreover, actually one needs to know (a bit of) C to write efficient Cython code so that it's difficult for some contributors to understand/develop Cython extensions.
I must admit that I never needed to debug anything because I was doing TDD in the fist place, but probably you are right - debugging generated monster codes must be quite scary as compared to pure Python code with full IDE support like PyCharm. Anyways, call me chauvinist, but I'd say it's just a sad fact of life that you need to know a thing or two about writing correct numeric low-level performance oriented code. I assume you know it anyways and I'm sure that your worked up summation example below was just to make a completely different point, but as a matter of fact in your code the worst-case error grows proportionally to the number of elements in the vector (N) and RMS error grows proportionally to the square root of N for random inputs, so the results of your computations are going to be accordingly pretty random in the general case ;-) Where I'm getting with this is that people who do this kind of stuff are somehow not bothered by Cython problems, and people who don't are rightfully bothered by valid issues, but if they are going to be helped, will it help their cause :-) ? Who knows... On top of that, again, there is the whole MPI story. I used to write Python stuff that scaled to the hundreds of thousands of cores. I still did SIMD inside OpenMP threads on the local nodes on top of that just for kicks, but actually I could have achieved a factor of 4x speedup just by scheduling my jobs overnight with 4x cores instead and saved myself the trouble. But I wanted trouble, because it was fun :-) Cython and mpi4py make MPI almost criminally easy on Python, so once you get this far, there comes the question - does 2x or 4x on the local node actually matter at all?
So my questions are: Is it technically possible to extend Python and PyPy to develop such extension and make it very efficient? Which tools should be used? How should it be written?
It is absolutely technically possible and is a good idea in as far as I'm concerned, but I think that the challenge lies in developing conventions for semantics and getting people to accept them. I think that the zoo of various accelerators / compilers / boosters for Python only proves the point that this must be the hard part. As for a backing buffer access mechanism, cffi is definitively a right tool - PyPy can already "see through" it as you've proven with your small example. -- Sincerely yours, Yury V. Zaytsev
Hello, I thought again about this performance issue and this possible extension. One very important point is to be able to define immutable structures (like Julia struct) in Python and vectors of such structures. This is really a big advantage of Julia which makes it much more efficient (see https://github.com/paugier/nbabel/tree/master/py/microbench). I completely rewrote a new version of the presentation of this potential new extension: https://github.com/paugier/nbabel/blob/master/py/vector_v2.md It is much less focused on array programming and more on simple object-oriented programming. It would allow one to write the equivalents of very efficient Julia codes in Python, for example something like that: ```python import ooperf as oop @oop.native_bag class Point4D: # an immutable struct x: float y: float z: float w: float def square(self): return self.x**2 + self.y**2 + self.z**2 + self.w**2 Points = oop.Vector[Point4D] points = Points.empty(1000) ``` I have 2 questions: - Please, can anyone tell me how an extension providing native_bag and Vector could be written for PyPy? Which tool should be used? - I also see that Julia is able to vectorize code like the line in `square` but not PyPy (even for cffi struct). Why? Is there a deep reason for that? I conclude from my small experiments that cffi + Python is not sufficient. Pierre
participants (2)
-
PIERRE AUGIER
-
Yury V. Zaytsev