[Python-ideas] solving multi-core Python

Thu Jun 25 03:57:19 CEST 2015

On Wed, Jun 24, 2015 at 10:28 AM, Sturla Molden <sturla.molden at gmail.com> wrote:
> The reality is that Python is used on even the largest supercomputers. The
> scalability problem that is seen on those systems is not the GIL, but the
> module import. If we have 1000 CPython processes importing modules like
> NumPy simultaneously, they will do a "denial of service attack" on the file
> system. This happens when the module importer generates a huge number of
> failed open() calls while trying to locate the module files.
>
> There is even described in a paper on how to avoid this on an IBM Blue
> Brain: "As an example, on Blue Gene P just starting up Python and importing
> NumPy and GPAW with 32768 MPI tasks can take 45 minutes!"

I'm curious what difference there is under Python 3.4 (or even 3.3).
Along with being almost entirely pure Python, the import system now
has some optimizations that help mitigate filesystem access
(particularly stats).

Regardless, have there been any attempts to address this situation?
I'd be surprised if there haven't. :)  Is the solution described in
the cited paper sufficient?  Earlier Barry brought up Emac's unexec as
at least an inspiration for a solution.  I expect there are a number
of approaches.  It would be nice to address this somehow (though
unrelated to my multi-core proposal).  I would expect that it could
also have bearing on interpreter start-up time.  If it's worth
pursuing then consider posting something to import-sig.

-eric