
On 24/06/15 07:01, Eric Snow wrote:
Well, perception is 9/10ths of the law. :) If the multi-core problem is already solved in Python then why does it fail in the court of public opinion. The perception that Python lacks a good multi-core story is real, leads organizations away from Python, and will not improve without concrete changes.
I think it is a combination of FUD and the lack of fork() on Windows. There is a lot of utterly wrong information about CPython and its GIL. The reality is that Python is used on even the largest supercomputers. The scalability problem that is seen on those systems is not the GIL, but the module import. If we have 1000 CPython processes importing modules like NumPy simultaneously, they will do a "denial of service attack" on the file system. This happens when the module importer generates a huge number of failed open() calls while trying to locate the module files. There is even described in a paper on how to avoid this on an IBM Blue Brain: "As an example, on Blue Gene P just starting up Python and importing NumPy and GPAW with 32768 MPI tasks can take 45 minutes!" http://www.cs.uoregon.edu/research/paracomp/papers/iccs11/iccs_paper_final.p... And while CPython is being used for massive parallel computing to e.g. model the global climate system, there is this FUD that CPython does not even scale up on a laptop with a single multicore CPU. I don't know where it is coming from, but it is more FUD than truth. The main answers to FUD about the GIL and Python in scientific computing are these: 1. Python in itself generates a 200x to 2000x performance hit compared to C or Fortran. Do not write compute kernels in Python, unless you can compile with Cython or Numba. If you have need for speed, start by moving the performance critical parts to Cython instead of optimizing for a few CPU cores. 2. If you can release the GIL, e.g. in Cython code, Python threads scale like any other native OS thread. They are real threads, not fake threads in the interpreter. 3. The 80-20, 90-10, or 99-1 rule: The majority of the code accounts for a small portion of the runtime. It is wasteful to optimize "everything". The more speed you need, the stronger this asymmetry will be. Identify the bottlenecks with a profiler and optimize those. 4. Using C or Java does not give you ha faster hard-drive or faster network connection. You cannot improve on network access by using threads in C or Java instead of threads in Python. If your code is i/o bound, Python's GIL does not matter. Python threads do execute i/o tasks in parallel. (This is the major misunderstanding.) 5. Computational intensive parts of a program is usually taken case of in libraries like BLAS, LAPACK, and FFTW. The Fortran code in LAPACK does not care if you called it from Python. It will be as fast as it can be, independent of Python. The Fortran code in LAPACK also have no concept of Python's GIL. LAPACK libraries like Intel MKL can use threads internally without asking Python for permission. 6. The scalability problem when using Python on a massive supercomputer is not the GIL but the module import. 7. When using OpenCL we write kernels as plain text. Python is excellent at manipulating text, more so than C. This also applies to using OpenGL for computer graphics with GLSL shaders and vetexbuffer objects. If you need the GPU, you can just as well use Python on the CPU. Sturla