[Cython] Add a Pythran backend for Numpy operation

Adrien Guinet adrien at guinet.me
Wed Feb 15 16:15:40 EST 2017


Hello everyone!

I've been working for quite some time on the usage of Pythran as a backend for
the Numpy operations that Cython can generate. The associated PR on github can
be found here: https://github.com/cython/cython/pull/1607. This work has been
sponsored by the OpenDreamKit project
(https://github.com/OpenDreamKit/OpenDreamKit/).

First of all, the Pythran project
(https://github.com/serge-sans-paille/pythran) is a (subset of) Python to C++
compiler, that aims at optimizing "scientific" Python code.  It also provides a
full C++ implementation of a major set of the Numpy API.  Some of the advantage
of this implementation is that it supports expression templates and SIMD
instructions (partially thanks to Boost.SIMD [1]).

One of the limitation of the current Numpy support of Cython is that it relies
on the original Numpy Python module for a lot of computations. The overall idea
is to replace these calls by the Numpy implementation provided within the
Pythran project.

I'll discuss in this mail the various choices that have been made, why and some
implementation details. Then we'll also show some benchmark to see the
potential improvements, which is the point of all this in the end :)

Pythran limitations
-------------------

The Pythran Numpy implementation has some limitations:

* array "views" are not supported. That means that arrays must be stored in
  contiguous memory. Fortran and C-style format are supported.
* the endianness of the integers must be the same that the one of the targeted
  architecture (note that Cython has the same limitation)

That's why we did two things:

* the usage of the Pythran backend needs to be explicitly asked by the user by
  providing the --np-pythran flag to the Cython compiler, or by using the
  "np_pythran" flag to the cythonize call (for distutils)
* in function arguments, Numpy buffers are replaced by fused types to be able
  to fall back in case of unsupported buffers. More on this below.

Implementation choices and details within Cython
------------------------------------------------

a) PythranExpr

We defined a new type in PyrexTypes.py, which defines a Pythran buffer or
expression. A Pythran expression is associated to a Pythran expression
template, whose C++ type can be something like "decltype(a+b)". We thus compose
every expression/function call like this, which allows us to use Pythran's
expression template mechanism.

We also choose to let the C++ compiler deduced the final type of every
expression, and emit errors if something goes wrong. This choice allows not to
have to rewrite in Python all the (potentially implicit) conversion rules that
can apply in a C/C++ program, which could be error prone. The disadvantage is
that it may generate not that trivial error messages for the end-user.

b) Fused types for function arguments

As Pythran has some limitations about the Numpy buffers it can support, we
chose to replace Numpy buffer arguments by a fused type that can be either a
Pythran buffer or the original Numpy buffer. The decision is made to use one
type or another according to the limitations described above.

This allows a fallback to the original Cython implementation in case of an
unsupported buffer type.

Tests
-----

A flag has been added to the runtests.py script. If provided with a path to a
Pythran installation, it will run the C++ tests in "Pythran" mode. This allows
to reuse the whole test suite of Cython.

Benchmark
---------

The whole idea of this is to get better performances.

Here is a simple benchmark of what this mode can achieve, using this cython code:

def sqrt_sum(numpy.ndarray[numpy.float_t, ndim=1] a,
numpy.ndarray[numpy.float_t, ndim=1] b):
    return numpy.sqrt(numpy.sqrt(a*a+b*b))

On my computer (Core i7-6700HQ), this gives there results, with an array of
100000000 32-bit floats as input:

- for the classical Cython version: 960ms
- for the Cython version using the Pythran backend: 761ms
- for the Cython version using the Pythran backend using SIMD instructions: 243ms

which makes a speedup of ~3.9x using SIMD instructions.

Documentation
-------------

I put an example of how to use this with distutils in the documentation. It
could be put elsewhere if needed, or formatted differently.


I'd be happy to discuss the various choices made here, and the implementation
details.

Thanks everyone!

[1]: https://github.com/NumScale/boost.simd


More information about the cython-devel mailing list