
On 25 June 2015 at 02:28, Sturla Molden <sturla.molden@gmail.com> wrote:
On 24/06/15 07:01, Eric Snow wrote:
Well, perception is 9/10ths of the law. :) If the multi-core problem is already solved in Python then why does it fail in the court of public opinion. The perception that Python lacks a good multi-core story is real, leads organizations away from Python, and will not improve without concrete changes.
I think it is a combination of FUD and the lack of fork() on Windows. There is a lot of utterly wrong information about CPython and its GIL.
The reality is that Python is used on even the largest supercomputers. The scalability problem that is seen on those systems is not the GIL, but the module import. If we have 1000 CPython processes importing modules like NumPy simultaneously, they will do a "denial of service attack" on the file system. This happens when the module importer generates a huge number of failed open() calls while trying to locate the module files.
Slight tangent, but folks hitting this issue on 2.7 may want to investigate Eric's importlib2: https://pypi.python.org/pypi/importlib2 It switches from stat-based searching for files to the Python 3.3+ model of directory listing based searches, which can (anecdotally) lead to a couple of orders of magnitude of improvement in startup for code loading modules from NFS mounts.
And while CPython is being used for massive parallel computing to e.g. model the global climate system, there is this FUD that CPython does not even scale up on a laptop with a single multicore CPU. I don't know where it is coming from, but it is more FUD than truth.
Like a lot of things in the vast sprawling Python ecosystem, I think there are aspects of this that are a discoverabiilty problem moreso than a capability problem. When you're first experimenting with parallel execution, a lot of the time folks start with computational problems like executing multiple factorials at once. That's trivial to do across multiple cores even with a threading model like JavaScript's worker threads, but can't be done in CPython without reaching for the multiprocessing module. This is the one place where I'll concede that folks learning to program on Windows or the JVM and hence getting the idea that "creating threads is fast, creating processes is slow" causes problems: folks playing this kind of thing are far more likely to go "import threading" than they are "import multiprocessing" (and likewise for the ThreadPoolExecutor vs the ProcessPoolExecutor if using concurrent.futures), and their reaction when it doesn't work is far more likely to be "Python can't do this" than it is "I need to do this differently in Python from the way I do it in C/C++/Java/JavaScript".
The main answers to FUD about the GIL and Python in scientific computing are these:
It generally isn't scientific programmers I personally hit problems with (although we have to allow for the fact many of the scientists I know I met *because* they're Pythonistas). For that use case, there's not only HPC to point to, but a number of papers that talking about Cython and Numba in the same breath as C, C++ and FORTRAN, which is pretty spectacular company to be in when it comes to numerical computation. Being the fourth language Nvidia supported directly for CUDA doesn't hurt either. Instead, the folks that I think have a more valid complaint are the games developers, and the folks trying to use games development as an educational tool. They're not doing array based programming the way numeric programmers are (so the speed of the NumPy stack isn't any help), and they're operating on shared game state and frequently chattering back and forth between threads of control, so high overhead message passing poses a major performance problem. That does suggest to me a possible "archetypal problem" for the work Eric is looking to do here: a 2D canvas with multiple interacting circles bouncing around. We'd like each circle to have its own computational thread, but still be able to deal with the collision physics when they run into each other. We'll assume it's a teaching exercise, so "tell the GPU to do it" *isn't* the right answer (although it might be an interesting entrant in a zoo of solutions). Key performance metric: frames per second Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia