[pypy-dev] New Python/PyPy extension for object oriented numerically intensive codes ?

PIERRE AUGIER pierre.augier at univ-grenoble-alpes.fr
Wed Jan 6 15:40:52 EST 2021

----- Mail original -----
> De: "Yury V. Zaytsev" <yury at shurup.com>
> À: "PIERRE AUGIER" <pierre.augier at univ-grenoble-alpes.fr>
> Cc: "pypy-dev" <pypy-dev at python.org>
> Envoyé: Mardi 5 Janvier 2021 15:06:02
> Objet: Re: [pypy-dev] New Python/PyPy extension for object oriented numerically intensive codes ?

> On Tue, 5 Jan 2021, PIERRE AUGIER wrote:
>> I thought about what was missing and it seems to me that it could be
>> provided by a Python/PyPy extension without addition to the Python
>> language. However, Numpy API is IMHO not adapted. Now that Python is
>> (and will be more and more) compared to Julia, I think it becomes
>> necessary to have a good tool to write efficient numerical codes in pure
>> Python style.
> Hi Pierre,
> I assume that you've had a detailed look at Cython, hand't you? All three
> points that you've listed are solved there in one way or another.
> Of course, it comes with its own set of tradeoffs / disadvantages, but in
> my scientific life I can't say I was really constrained by them, because
> Cython blends with Python so naturally (per module and per function), so I
> was actually always starting with pure Python and then going down up to
> the level of SIMD assembly for 1% of the code where it actually mattered
> (99% time was spent)... plus the whole MPI story for scaling.

I used quite a lot Cython some years ago. I'm actually pretty happy that we don't use it anymore in Fluiddyn packages :-)
For ahead-of-time compilation of Python, Transonic-Pythran is in my opinion nicer to use and 
I usually get more efficient results with nicer codes than with the C-like big Cython extensions that we used to have.

> I'm afraid the situation is simply so good that there is too little
> motivation to solve this in Python itself :-/ and solving it in Python has
> its own problems. I guess first one really needs to find cases, when
> solving it in Python is rationally mandated to gather enough momentum.

A big issue IMHO with Cython is that Cython code is not compatible with Python and can't be interpreted. So we lose the advantage of an interpreted language in term of development. One small change in this big extension and one needs to recompile everything. For me, debugging is really harder (maybe because I'm not good at debugging native codes). Moreover, actually one needs to know (a bit of) C to write efficient Cython code so that it's difficult for some contributors to understand/develop Cython extensions.

Therefore, I'm convinced that the situation is not so good (see also https://fluiddyn.netlify.app/transonic-vision.html). It's also interesting to compare what can be done in Python (and Cython) and in Julia in terms of scientific computing.

Again, it would be very useful in the long term to be able to write more efficient codes in "simple" interpreted Python.

> --
> Sincerely yours,
> Yury V. Zaytsev

I played a bit with cffi which is not so far from what would be needed to develop the extension that I'd like to have.

For example, this is quite efficient:

from cffi import FFI
ffi = FFI()

    typedef struct {
        double x, y, z;
    } point_t;

def sum_x(vec):
    s = 0.0
    for elem in vec:
        s += elem.x
    return s

points = ffi.new("point_t[]", 1000)

In [5]: %timeit sum_x(points)
1.34 µs ± 0.693 ns per loop

compared to in Julia

$ julia microbench_sum_x.jl                   
  1.031 μs (1 allocation: 16 bytes)

However, `points` is an instance of `_cffi_backend._CDataBase` (as `points[0]`) and it's not possible to "add methods to these objects".

As soon as I hide the _cffi_backend._CDataBase points[0] objects in Python objects, it becomes much much slower.

This makes me think again that PyPy would really need a nice extension to write Python a bit less dynamic than standard Python but more efficient.

So my questions are: Is it technically possible to extend Python and PyPy to develop such extension and make it very efficient? Which tools should be used? How should it be written?


More information about the pypy-dev mailing list