[pypy-dev] Questions on the pypy+numpy project

Stefan Behnel stefan_ml at behnel.de
Mon Oct 17 12:11:48 CEST 2011


David Cournapeau, 17.10.2011 00:01:
> On Sun, Oct 16, 2011 at 10:20 PM, Ian Ozsvald wrote:
>> how big is the scipy ecosystem beyond numpy? What's the rough line
>> count for Python, C, Fortran etc that depends on numpy?
>
> The ecosystem is pretty big. There are at least in the order of
> hundred of packages that depend directly on numpy and scipy.
>
> For scipy alone, the raw count is around 150k-300k LOC (it is a bit
> hard to estimate because we include some swig-generated code that I
> have ignored here, and some code duplication to deal with distutils
> insanity). There is around 80k LOC of fortran alone in there.
>
> More and more scientific code use cython for speed or just for
> interfacing with C (and recently C++). Other tools have been used for
> similar reasons (f2py, in particular, to automatically wrap fortran
> and C).

and fwrap nowadays, which also generates glue code for talking to Fortran 
from Cython code, through a thin C code wrapper (AFAIK).


> f2py at least is quite tightly coupled to numpy C API. I know
> there is work for a pypy-friendly backend for cython, but I don't know
> where things are there.

It's, erm, resting. The GSoC is over, the code hasn't been merged into 
mainline yet, lacks support for some recent Cython language features and is 
not in a state that would allow building anything major with it right away.

It's based on ctypes, so it suffers from the same problems as ctypes, 
namely API/ABI inconsistencies beyond those that "ctypes_configure" can 
handle. In particular, things like talking to C macros will at least 
require additional C glue code to be generated, which doesn't currently 
happen. What works is the stripping of Cython specific syntax off the code 
and to map "regular" C code interactions to corresponding ctypes calls. So, 
some things work as it is, everything else needs more work. Helping hands 
and funding are welcome.

That being said, I still think it's a promising approach, and it would be 
very interesting for PyPy to support Cython code (in one way or another). 
Cython certainly has a good standing in the Scientific Python community 
these days. If PyPy wants to enter as well, it will have to show that it 
can easily and efficiently interface with the huge amount of existing 
scientific code out there, be it C, C++, Fortran, Cython or whatever. And 
rewriting the code or even just the wrappers for Yet Another Python 
Implementation is not a scalable solution to that problem.


> I would like to see less C boilerplate code in scipy, and more cython
> usage (which generates faster code and is much more maitainable); this
> can also benefit pypy, if only for making the scipy code less
> dependend on CPython details.

And by making the implementation essentially Python. That way, it can much 
more easily be ported to other Python platforms, especially PyPy, than if 
you have to start by reverse engineering even the exact wrapper signature 
from C code.

Stefan



More information about the pypy-dev mailing list