[pypy-dev] Questions on the pypy+numpy project

Mon Oct 17 00:01:56 CEST 2011

Hi Ian,

On Sun, Oct 16, 2011 at 10:20 PM, Ian Ozsvald <ian at ianozsvald.com> wrote:

>
> I'd like to pose some questions:
> * how big is the scipy ecosystem beyond numpy? What's the rough line
> count for Python, C, Fortran etc that depends on numpy?

The ecosystem is pretty big. There are at least in the order of
hundred of packages that depend directly on numpy and scipy.

For scipy alone, the raw count is around 150k-300k LOC (it is a bit
hard to estimate because we include some swig-generated code that I
have ignored here, and some code duplication to deal with distutils
insanity). There is around 80k LOC of fortran alone in there.

More and more scientific code use cython for speed or just for
interfacing with C (and recently C++). Other tools have been used for
similar reasons (f2py, in particular, to automatically wrap fortran
and C). f2py at least is quite tightly coupled to numpy C API. I know
there is work for a pypy-friendly backend for cython, but I don't know
where things are there.

I would like to see less C boilerplate code in scipy, and more cython
usage (which generates faster code and is much more maitainable); this
can also benefit pypy, if only for making the scipy code less
dependend on CPython details.

One thing I have little doubt about is that pypy needs a "story" to
makes wrapping of fortran/c/c++ libraries easy, because otherwise few
people in the scientific community will be interested. For better or
worse, there are tens of millions of lines of code written in those
languages, and a lot of them domain specific (you will not write a
decent FFT code without knowing a lot about its implementation
details, same for large eigen values problems). There needs to be some
automatic wrappers generators. Scipy alone easily wraps thousand if
not more functions written in fortran.

cheers,

David