[SciPy-Dev] Using Pythran to compile some of the scipy internals

Serge Guelton serge.guelton at telecom-bretagne.eu
Mon Jan 22 09:28:40 EST 2018


> Another potential benefit is to decrease the size of the binary distribution of
> SciPy. Cython extensions are quite expensive in this respect. Do you have an
> idea of how Pythran compares? Both in case of only float64 inputs, and with
> templated inputs? A good example of the latter is scipy.ndimage.label, which is
> a straightforward function that ends up being a 700kb .so

Thanks for pointing out this aspect. There's still room for improvement
here, but with current pythran's master, and same compilation flags, the
cython implementation of `max_len_seq_inner` takes ~700kb while
pythran's version takes ~200kb.

I'm going to investigate that a bet more though, and probably post the
results in this thread later on.

>     This adds an extra dependency on Pythran, which uses C++ as backend.
> 
> 
> Only a build-time dependency. That's not my major worry from the maintenance
> point of view.

And a dependency to libstc++ too.

>     This increases the failure surface. Although alive since 2012 and being
>     tested a lot [1] on Linux (but scarcely on Windows), its is obviously
>     less mature than cython
> 
> 
> This is a worry. We need Windows, Linux, macOS (officially supported, also the
> 32-bit flavors) as well as less commonly used Unix/BSD-like platforms.
> 
> I guess Windows needs separate testing, both with gcc and MSVC. But for all
> other platforms, can you say something about portability based on the kind of
> C++ Pythran generates?

Portability on OSX never required extra work, it's just less tested than
the Linux version. There was some C++11 support issue with MSVC compiler
back when I tried, but that was some 2 years ago. I'll try again.


>     = Alternatives
> 
>     There is an experimental Pythran mode in cython[2] that uses Pythran as
>     a backend for numpy operations. Unfortunately it is still at early
>     stages and cannot translate calls to ``np.roll`` or ``np.sum(a[i,:] - b[j,
>     :])`` while Pythran supports it.
> 
>     Instead of translating Cython files, I could also focus on some
>     pure-python functions. I tested Pythran on the rosenbrock function and I
>     get good speedup (from 1.5x to 4x depending on vectorization being
>     enabled or not) there too.
> 
>     So yeah, that's a rather long introduction to probe the interest here
>     around that idea :-)
> 
> 
> Interest in general, but there's a long way to go - our requirements are pretty
> demanding. I have seen some comparisons between Cython, Pythran and Numba in
> terms of performance and ease of use, but never a comprehensive comparison from
> the point of view of library authors. I know Travis O. has an interest in
> seeing Numba being adopted more widely, which will also need such a comparison.
> It should cover at least:
> 
> - portability
> - performance
> - maturity
> - maintenance status (active devs, how quick do bugs get fixed after a release
> with an issue)
> - ease of use (@jit vs. Pythran comments vs. translate to .pyx syntax)
> - size of generated binaries
> - templating support for multiple dtypes
> - debugging and optimization experience/tools

Independently of the potential inclusion of Pythran in scipy, those are
very valuable points. I'm 100% biased but would say that community and
debugging are two weak points of Pythran, esp. when compared to Cython.

++
Serge


More information about the SciPy-Dev mailing list