Re: [Python-ideas] solving multi-core Python

June 25, 2015

      On Wed, Jun 24, 2015 at 10:28 AM, Sturla Molden <sturla.molden@gmail.com> wrote:
...
The reality is that Python is used on even the largest supercomputers. The
scalability problem that is seen on those systems is not the GIL, but the
module import. If we have 1000 CPython processes importing modules like
NumPy simultaneously, they will do a "denial of service attack" on the file
system. This happens when the module importer generates a huge number of
failed open() calls while trying to locate the module files.
There is even described in a paper on how to avoid this on an IBM Blue
Brain: "As an example, on Blue Gene P just starting up Python and importing
NumPy and GPAW with 32768 MPI tasks can take 45 minutes!"
I'm curious what difference there is under Python 3.4 (or even 3.3).
Along with being almost entirely pure Python, the import system now
has some optimizations that help mitigate filesystem access
(particularly stats).

Regardless, have there been any attempts to address this situation?
I'd be surprised if there haven't. :)  Is the solution described in
the cited paper sufficient?  Earlier Barry brought up Emac's unexec as
at least an inspiration for a solution.  I expect there are a number
of approaches.  It would be nice to address this somehow (though
unrelated to my multi-core proposal).  I would expect that it could
also have bearing on interpreter start-up time.  If it's worth
pursuing then consider posting something to import-sig.

-eric

Re: [Python-ideas] solving multi-core Python

Eric Snow