[SciPy-Dev] Using Pythran to compile some of the scipy internals
Serge Guelton
serge.guelton at telecom-bretagne.eu
Mon Jan 22 09:28:40 EST 2018
> Another potential benefit is to decrease the size of the binary distribution of
> SciPy. Cython extensions are quite expensive in this respect. Do you have an
> idea of how Pythran compares? Both in case of only float64 inputs, and with
> templated inputs? A good example of the latter is scipy.ndimage.label, which is
> a straightforward function that ends up being a 700kb .so
Thanks for pointing out this aspect. There's still room for improvement
here, but with current pythran's master, and same compilation flags, the
cython implementation of `max_len_seq_inner` takes ~700kb while
pythran's version takes ~200kb.
I'm going to investigate that a bet more though, and probably post the
results in this thread later on.
> This adds an extra dependency on Pythran, which uses C++ as backend.
>
>
> Only a build-time dependency. That's not my major worry from the maintenance
> point of view.
And a dependency to libstc++ too.
> This increases the failure surface. Although alive since 2012 and being
> tested a lot [1] on Linux (but scarcely on Windows), its is obviously
> less mature than cython
>
>
> This is a worry. We need Windows, Linux, macOS (officially supported, also the
> 32-bit flavors) as well as less commonly used Unix/BSD-like platforms.
>
> I guess Windows needs separate testing, both with gcc and MSVC. But for all
> other platforms, can you say something about portability based on the kind of
> C++ Pythran generates?
Portability on OSX never required extra work, it's just less tested than
the Linux version. There was some C++11 support issue with MSVC compiler
back when I tried, but that was some 2 years ago. I'll try again.
> = Alternatives
>
> There is an experimental Pythran mode in cython[2] that uses Pythran as
> a backend for numpy operations. Unfortunately it is still at early
> stages and cannot translate calls to ``np.roll`` or ``np.sum(a[i,:] - b[j,
> :])`` while Pythran supports it.
>
> Instead of translating Cython files, I could also focus on some
> pure-python functions. I tested Pythran on the rosenbrock function and I
> get good speedup (from 1.5x to 4x depending on vectorization being
> enabled or not) there too.
>
> So yeah, that's a rather long introduction to probe the interest here
> around that idea :-)
>
>
> Interest in general, but there's a long way to go - our requirements are pretty
> demanding. I have seen some comparisons between Cython, Pythran and Numba in
> terms of performance and ease of use, but never a comprehensive comparison from
> the point of view of library authors. I know Travis O. has an interest in
> seeing Numba being adopted more widely, which will also need such a comparison.
> It should cover at least:
>
> - portability
> - performance
> - maturity
> - maintenance status (active devs, how quick do bugs get fixed after a release
> with an issue)
> - ease of use (@jit vs. Pythran comments vs. translate to .pyx syntax)
> - size of generated binaries
> - templating support for multiple dtypes
> - debugging and optimization experience/tools
Independently of the potential inclusion of Pythran in scipy, those are
very valuable points. I'm 100% biased but would say that community and
debugging are two weak points of Pythran, esp. when compared to Cython.
++
Serge
More information about the SciPy-Dev
mailing list