On Tue, 2020-11-24 at 16:47 +0100, PIERRE AUGIER wrote:
Hi,
I recently took a bit of time to study the comment "The ecological impact of high-performance computing in astrophysics" published in Nature Astronomy (Zwart, 2020, https://www.nature.com/articles/s41550-020-1208-y, https://arxiv.org/pdf/2009.11295.pdf), where it is stated that "Best however, for the environment is to abandon Python for a more environmentally friendly (compiled) programming language.".
I wrote a simple Python-Numpy implementation of the problem used for this study (https://www.nbabel.org) and, accelerated by Transonic- Pythran, it's very efficient. Here are some numbers (elapsed times in s, smaller is better):
# particles | Py | C++ | Fortran | Julia | -------------|-----|-----|---------|-------| 1024 | 29 | 55 | 41 | 45 | 2048 | 123 | 231 | 166 | 173 |
The code and a modified figure are here: https://github.com/paugier/nbabel (There is no check on the results for https://www.nbabel.org, so one still has to be very careful.)
I think that the Numpy community should spend a bit of energy to show what can be done with the existing tools to get very high performance (and low CO2 production) with Python. This work could be the basis of a serious reply to the comment by Zwart (2020).
Unfortunately the Python solution in https://www.nbabel.org is very bad in terms of performance (and therefore CO2 production). It is also true for most of the Python solutions for the Computer Language Benchmarks Game in https://benchmarksgame-team.pages.debian.net/benchmarksgame/ (codes here https://salsa.debian.org/benchmarksgame-team/benchmarksgame#what-else ).
We could try to fix this so that people see that in many cases, it is not necessary to "abandon Python for a more environmentally friendly (compiled) programming language". One of the longest and hardest task would be to implement the different cases of the Computer Language Benchmarks Game in standard and modern Python-Numpy. Then, optimizing and accelerating such code should be doable and we should be able to get very good performance at least for some cases. Good news for this project, (i) the first point can be done by anyone with good knowledge in Python-Numpy (many potential workers), (ii) for some cases, there are already good Python implementations and (iii) the work can easily be parallelized.
It is not a criticism, but the (beautiful and very nice) new Numpy website https://numpy.org/ is not very convincing in terms of performance. It's written "Performant The core of NumPy is well- optimized C code. Enjoy the flexibility of Python with the speed of compiled code." It's true that the core of Numpy is well-optimized C code but to seriously compete with C++, Fortran or Julia in terms of numerical performance, one needs to use other tools to move the compiled-interpreted boundary outside the hot loops. So it could be reasonable to mention such tools (in particular Numba, Pythran, Cython and Transonic).
Is there already something planned to answer to Zwart (2020)?
I don't think there is any need for rebuttal. The author is right right, you should not write the core of an N-Body simulation in Python :). I completely disagree with the focus on programming languages/tooling, quite honestly. A PhD who writes performance critical code, must get the education necessary to do it well. That may mean learning something beyond Python, but not replacing Python entirely. In one point the opinion notes: NumPy, for example, is mostly used for its advanced array handling and support functions. Using these will reduce runtime and, therefore, also carbon emission, but optimization is generally stopped as soon as the calculation runs within an unconsciously determined reasonable amount of time, such as the coffee-refill timescale or a holiday weekend. IMO, this applies to any other programming language just as much. If your correlation is fast enough, you will not invest time in implementing an fft based algorithm. If you iterate your array in Fortran instead of C-order in your C++ program (which new users may just do randomly), you are likely to waste more(!) cpu cycles then if you were using NumPy :). Personally, I am always curious how much of that "GPUs are faster" factor is actually due to the effort spend on making it faster... My angle is that in the end, it is far more about technical knowledge than about using the "right" language. An example: At an old workplace we had had some simulations running five times slower, because years earlier someone forgot to set `RELEASE=True` in the default config, always compiling in debug mode! But honestly, if it was 5 times faster, we probably would probably have done at least 3 times as many simulations :). Aside from that, most complex C/C++ programs can probably be sped up significantly just as well. In the end, my main reading is that code running on power-hungry machines (clusters, workstations) should maybe be audited for performance. Yes! (Although even then, reduces tend to get used, no matter how much you have!) As for actually doing something to reduce the carbon footprint, I think the vast majority of our users would have more impact if they throttle their CPUs a bit rather than worry about what tool they use to do their job :). Cheers, Sebastian
Any opinions or suggestions on this potential project?
Pierre
PS: Of course, alternative Python interpreters (PyPy, GraalPython, Pyjion, Pyston, etc.) could also be used, especially if HPy ( https://github.com/hpyproject/hpy) is successful (C core of Numpy written in HPy, Cython able to produce HPy code, etc.). However, I tend to be a bit skeptical in the ability of such technologies to reach very high performance for low-level Numpy code (performance that can be reached by replacing whole Python functions with optimized compiled code). Of course, I hope I'm wrong! IMHO, it does not remove the need for a successful HPy!
-- Pierre Augier - CR CNRS http://www.legi.grenoble-inp.fr LEGI (UMR 5519) Laboratoire des Ecoulements Geophysiques et Industriels BP53, 38041 Grenoble Cedex, France tel:+33.4.56.52.86.16 _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion