[SciPy-Dev] Numba as a dependency for SciPy?

Thu Mar 8 08:39:02 EST 2018

Hi everybody,

My take on this issue (I choose to reply to the first mail of the thread,
because I've read the thread, but wasn't sure where to insert a reply).

First, I was excited when I read the beginning of the thread. Numba is a
big promise: it can make our code faster while staying high level.

But following the thread, it seems that it is not prime time yet.

I am not worried at all that it is a "company-driven project". The code
license is right, and I think that there is a real will to grow a
community.

What I am worried about is that it has not reached a sufficient level of
portability. It is important that the base of the pyramid works easily on
less mainstream platforms like ARM. Embedded systems are important for
science, whether it be academic, industrial, or citizen science. From
what I read on this thread, there is still work to be done there,
included at the level of the Python packaging system. Things have
improved hugely in the last few years, so I am extremely hopeful. I worry
about the solution that @jit can fall back to NoOp when numba is not
available, because it means that those in such situations will have
silent slowdowns. I remember a few years ago, it was common for cluster
administrators to install numpy without linking it to an optimized blas
(using the embedded lapack-lite), and on many clusters numpy was
unusable. Computing clusters can also be quite adverse situations for
installation, as some don't have access to Internet, and the libraries
must be installed in an existing Python distribution, to play well with
domain-specific libraries.

Debugging is also a problem, but it seems slightly less of a showstopper
to me.

I would say that we should postpone this decision. Ideally, as a
community, we should help numba and the packaging ecosystem get to a
point where portability is no longer a problem.

Cheers,

Gaël

On Mon, Mar 05, 2018 at 08:06:11PM -0800, Ralf Gommers wrote:
> Hi all,

> Goal of this email: start a discussion to decide whether we'd be okay with
> relying on Numba as a dependency, now or in 1-2 years' time.

> Context: in https://github.com/pydata/sparse/issues/126 a discussion is ongoing
> about whether to adopt Cython or Numba, with Numba being preferred by the
> majority. That `sparse` package is meant to provide sparse *arrays* that down
> the line should either be replacing our current sparse *matrices* or at least
> be integrated in scipy.sparse in addition to them. See https://github.com/scipy
> /scipy/issues/8162 and https://github.com/hameerabbasi/sparse-ndarray-protocols
> for more details on that.

> Also related is the question from Serge Guelton some weeks ago about whether
> we'd want to rely on Pythran: https://mail.python.org/pipermail/scipy-dev/
> 2018-January/022325.html

> On that Pythran thread I commented that we'd want to take these aspects into
> account:
> - portability
> - performance
> - maturity
> - maintenance status (active devs, how quick do bugs get fixed after a
> release with an issue)
> - ease of use (@jit vs. Pythran comments vs. translate to .pyx syntax)
> - size of generated binaries
> - templating support for multiple dtypes
> - debugging and optimization experience/tool

> Debugging is one of the ones where I'd say Numba is still worse than Cython,
> however that's being resolved as we speak: https://github.com/numba/numba/
> issues/2788

> One thing I missed in the above list is dependencies: while our use of Cython
> only adds a build-time dependency, Numba would add a run-time dependency. Given
> that binary wheels and conda packages for all major platforms are available
> that's not a showstopper, but it matters.

> Overall I'd say that:
> - Numba is better than Cython at: performance, ease of use, size of generated
> binaries, and templating support for multiple dtypes. Possibly also maintenance
> status right now.
> - Numba and Cython are about equally good at portability (I think, not much
> data about exotic platforms for Numba).
> - Cython is better than Numba at: maturity, debugging (but not for long anymore
> probably), dependencies.

> I'm usually pretty conservative in these things, but considering the above I'm
> leaning towards saying use of Numba should be allowed in the future. The added
> run-time dependency is the one major downside that's going to stay, however
> compared to our Fortran headaches that's a relatively small issue.

> Thoughts?

> Cheers,
> Ralf

> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev

-- 
    Gael Varoquaux
    Senior Researcher, INRIA Parietal
    NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
    Phone:  ++ 33-1-69-08-79-68
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux