[Numpy-discussion] Comment published in Nature Astronomy about The ecological impact of computing with Python

Sebastian Berg sebastian at sipsolutions.net
Tue Nov 24 12:25:02 EST 2020


On Tue, 2020-11-24 at 16:47 +0100, PIERRE AUGIER wrote:
> Hi,
> 
> I recently took a bit of time to study the comment "The ecological
> impact of high-performance computing in astrophysics" published in
> Nature Astronomy (Zwart, 2020, 
> https://www.nature.com/articles/s41550-020-1208-y, 
> https://arxiv.org/pdf/2009.11295.pdf), where it is stated that "Best
> however, for the environment is to abandon Python for a more
> environmentally friendly (compiled) programming language.".
> 
> I wrote a simple Python-Numpy implementation of the problem used for
> this study (https://www.nbabel.org) and, accelerated by Transonic-
> Pythran, it's very efficient. Here are some numbers (elapsed times in
> s, smaller is better):
> 
> > # particles |  Py | C++ | Fortran | Julia |
> > -------------|-----|-----|---------|-------|
> >     1024    |  29 |  55 |   41    |   45  |
> >     2048    | 123 | 231 |  166    |  173  |
> 
> The code and a modified figure are here: 
> https://github.com/paugier/nbabel (There is no check on the results
> for https://www.nbabel.org, so one still has to be very careful.)
> 
> I think that the Numpy community should spend a bit of energy to show
> what can be done with the existing tools to get very high performance
> (and low CO2 production) with Python. This work could be the basis of
> a serious reply to the comment by Zwart (2020).
> 
> Unfortunately the Python solution in https://www.nbabel.org is very
> bad in terms of performance (and therefore CO2 production). It is
> also true for most of the Python solutions for the Computer Language
> Benchmarks Game in 
> https://benchmarksgame-team.pages.debian.net/benchmarksgame/ (codes
> here 
> https://salsa.debian.org/benchmarksgame-team/benchmarksgame#what-else
> ).
> 
> We could try to fix this so that people see that in many cases, it is
> not necessary to "abandon Python for a more environmentally friendly
> (compiled) programming language". One of the longest and hardest task
> would be to implement the different cases of the Computer Language
> Benchmarks Game in standard and modern Python-Numpy. Then, optimizing
> and accelerating such code should be doable and we should be able to
> get very good performance at least for some cases. Good news for this
> project, (i) the first point can be done by anyone with good
> knowledge in Python-Numpy (many potential workers), (ii) for some
> cases, there are already good Python implementations and (iii) the
> work can easily be parallelized.
> 
> It is not a criticism, but the (beautiful and very nice) new Numpy
> website https://numpy.org/ is not very convincing in terms of
> performance. It's written "Performant The core of NumPy is well-
> optimized C code. Enjoy the flexibility of Python with the speed of
> compiled code." It's true that the core of Numpy is well-optimized C
> code but to seriously compete with C++, Fortran or Julia in terms of
> numerical performance, one needs to use other tools to move the
> compiled-interpreted boundary outside the hot loops. So it could be
> reasonable to mention such tools (in particular Numba, Pythran,
> Cython and Transonic).
> 
> Is there already something planned to answer to Zwart (2020)?

I don't think there is any need for rebuttal. The author is right
right, you should not write the core of an N-Body simulation in Python
:).  I completely disagree with the focus on programming
languages/tooling, quite honestly.


A PhD who writes performance critical code, must get the education
necessary to do it well.  That may mean learning something beyond
Python, but not replacing Python entirely.

In one point the opinion notes:

    NumPy, for example, is mostly used for its advanced array handling
    and support functions. Using these will reduce runtime and,
    therefore, also carbon emission, but optimization is generally 
    stopped as soon as the calculation runs within an unconsciously
    determined reasonable amount of time, such as the coffee-refill
    timescale or a holiday weekend.

IMO, this applies to any other programming language just as much.  If
your correlation is fast enough, you will not invest time in
implementing an fft based algorithm.  If you iterate your array in
Fortran instead of C-order in your C++ program (which new users may
just do randomly), you are likely to waste more(!) cpu cycles then if
you were using NumPy :).
Personally, I am always curious how much of that "GPUs are faster"
factor is actually due to the effort spend on making it faster...


My angle is that in the end, it is far more about technical knowledge
than about using the "right" language.

An example: At an old workplace we had had some simulations running
five times slower, because years earlier someone forgot to set
`RELEASE=True` in the default config, always compiling in debug mode!

But honestly, if it was 5 times faster, we probably would probably have
done at least 3 times as many simulations :).
Aside from that, most complex C/C++ programs can probably be sped up
significantly just as well.

In the end, my main reading is that code running on power-hungry
machines (clusters, workstations) should maybe be audited for
performance. Yes!  (Although even then, reduces tend to get used, no
matter how much you have!)


As for actually doing something to reduce the carbon footprint, I think
the vast majority of our users would have more impact if they throttle
their CPUs a bit rather than worry about what tool they use to do their
job :).

Cheers,

Sebastian


> 
> Any opinions or suggestions on this potential project?
> 
> Pierre
> 
> PS: Of course, alternative Python interpreters (PyPy, GraalPython,
> Pyjion, Pyston, etc.) could also be used, especially if HPy (
> https://github.com/hpyproject/hpy) is successful (C core of Numpy
> written in HPy, Cython able to produce HPy code, etc.). However, I
> tend to be a bit skeptical in the ability of such technologies to
> reach very high performance for low-level Numpy code (performance
> that can be reached by replacing whole Python functions with
> optimized compiled code). Of course, I hope I'm wrong! IMHO, it does
> not remove the need for a successful HPy!
> 
> --
> Pierre Augier - CR CNRS                 
> http://www.legi.grenoble-inp.fr
> LEGI (UMR 5519) Laboratoire des Ecoulements Geophysiques et
> Industriels
> BP53, 38041 Grenoble Cedex,
> France                tel:+33.4.56.52.86.16
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20201124/6638b3f3/attachment-0001.sig>


More information about the NumPy-Discussion mailing list