[Python-ideas] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures

Krishna Sankar ksankar at doubleclix.net
Wed Sep 19 04:29:08 CEST 2007

    Good points.

> We don't necessarily
> need new language features, we simply need bright people to sit down
>and think about the right way to express parallelism in Python and
> then write libraries (maybe in the stdlib) that implement those ideas.
	Exactly. This PEP is about this thinking about expressing parallelism in py.
	Also GIL is a challenge only in one implementation (I know, it is an important implementation!)
	My assumption is that the GIL restriction will be removed one way or another, soon. (same reason, quoting Joe Lewis)

	What we need to do is not to line them (i.e. GIL removal, parallelism) serially, but work on them simultaneously; that way they will leverage each other. Also, what ever paradigm(s) we zero-in on, can be implemented in other implementations, anyway.

	Moreover, IMHO, we need not force the GIL issue just yet. I firmly believe that it will find a solution in it's own time frame ...

Brian Granger wrote:
> Thinking about how Python can better support parallelism and
> concurrency is an important topic.  Here is how I see it:  if we don't
> address the issue, the Python interpreter 5 or 10 years from now will
> run at roughly the same speed as it does today.  This is because
> single CPU cores are not getting much faster (power consumption is too
> high).  Instead, most of the performance gains in hardware will be due
> to increased hardware parallelism, which means multi/many core CPUs.
> What to do about this pending crisis is a complicated issue.
> There are (at least) two levels that are important:
> 1.  Language level features that make it possible to build
> higher-level libraries/tools for parallelism.
> 2.  The high-level libraries/tools that most users and developers
> would use to express parallelism.
> I think it is absolutely critical that we worry about (1) before
> jumping to (2).  So, some thoughts about (1).  Does Python itself need
> to be changed to better enable people to write libraries for
> expressing parallelism?
> My answer to this is no.  The dominant languages for parallel
> computing (C/C++/Fortran) don't really have any additional constructs
> or features above Python in this respect.  Java has a more
> sophisticated support for threads.  Erlang has concurrency built into
> its core.  But, Python is not Erlang or Java.  As Twisted
> demonstrates, Python as a language is plenty powerful enough to
> express concurrency in an elegant way.  I am not saying that
> parallelism and concurrency is easy or wonderful today in Python, just
> that the language itself is not the problem.  We don't necessarily
> need new language features, we simply need bright people to sit down
> and think about the right way to express parallelism in Python and
> then write libraries (maybe in the stdlib) that implement those ideas.
> But, there is a critical problem in CPython's implementation that
> prevents people from really breaking new ground in this area with
> Python.  It is the GIL and here is why:
> * For the platforms on which Python runs, threads are what the
> hardware+OS people have given to us as the most fine grained way of
> mapping parallelism onto hardware.  This is true, even if you have
> philosophical or existential problems with threads.  With the
> limitations of the GIL, we can't take advantage of what hardware gives
> to us.
> * A process based solution using message passing is simply not
> suitable for many parallel algorithms that are communications bound.
> The shared state of threads is needed in many cases, not because
> sharing state is a "fantastic idea", but rather because it is fast.
> This will only become more true as multicore CPUs gain more
> sophisticated memory architectures with higher bandwidths.  Also, the
> overhead of managing processes is much greater than with threads.
> Many exellent fine grained parallel approaches like Cilk would not be
> possible with processes only.
> * There are a number of powerful, high-level Python packages that
> already exist (these have been named in the various threads) that
> allow parallelism to be expressed.  All of these suffer from a GIL
> related problem even though they are process based and use message
> passing.  Regardless of whether you are using blocking/non-blocking
> sockets/IPC, you can't run long running CPU bound code, because all
> the network related stuff will stop.  You then think, "OK, I will run
> the CPU intensive stuff in a different thread."  If the CPU intensive
> code is just regular Python, you are fine, the Python interpreter will
> switch between the network thread and the CPU intensive thread every
> so often.  But the second you run extension code that doesn't release
> the GIL, you are screwed.  The network thread will die until the
> extension code is done.  When it comes to implementing robust process
> based parallelism using sockets, the last thing you can afford is to
> have your networking black out like this, and in CPython it can't be
> avoided.
> <disclaimer>
> I am not saying that threads are what everyone should be using to
> express parallelism.  I am only saying that they are needed to
> implement robust higher-level forms of parallelism on multicore
> systems, regardless of whether the solution is using process+ threads
> or threads alone.
> </disclaimer>
> Of the dozen or so "parallel Python" packages that currently exist,
> they _all_ suffer from this problem (some hide it better than others
> though using clever tricks).  We can run but we can't hide.
> Because of these things, I think the current "Exploratory PEP" is
> entirely premature.  Let's figure out exactly what to do with the GIL
> and _then_ think about the fun stuff.
> Brian
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

More information about the Python-ideas mailing list