[Python-ideas] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures
Krishna Sankar
ksankar at doubleclix.net
Sun Sep 16 23:34:10 CEST 2007
Folks,
For some reason (fat fingers ;o() I missed the introduction to the
proposal. Here is the full mail (pardon me for the spam):
As a follow-up to the py3k discussions started by Bruce and Guido, I
pinged Brett and he suggested I submit an exploratory proposal. Would
appreciate insights, wisdom, the good, the bad and the ugly.
A) Does it make sense ?
B) Which application sets should we consider in designing the
interfaces and implementations
C) In this proposal, parallelism and concurrency are used in an
interchangeable fashion. Thoughts ?
D) Please suggest pertinent links, discussions and insights.
E) I have kept the proposal to a minimum to start the discussions and
to explore if this is the right thing to do. Collaboratively, as we
zero-in on one or two approaches, the idea is to expand it to a crisp
and clear PEP. Need to do some more formatting as well.
------------------------------------------------------------------------------------------------------------
PEP: xxxxxxxx
Title: Concurrency for moderately massive (4 to 32 cores) multi-core
architectures
Version: $Revision$
Last-Modified: $Date$
Author: Krishna Sankar <ksankar (at) doubleclix.net>,
Status: Wandering ! (as in "Not all those who wander are lost ..."
-J.R.R.Tolkien)
Type: Process
Content-Type: text/x-rst
Created: 15-Sep-2007
Abstract
--------
This proposal aims at leveraging the multi-core capability as an
embedded mechanism in python. It is not whether python is slow or fast,
but of performance and control of parallelism/concurrency in a
moderately massive parallelism world. The aim is 4 to 32 cores. The
proposal advocates two mechanisms - one for task parallelism and another
for data intensive parallelism. Scientific computing and web 2.0
frameworks are the forefront users for this proposal. Other applications
would benefit as well.
Rationale
---------
Multicore architectures need no introductions and their ubiquity is
evident. It is imperative that Python has one or more standard ways of
leveraging multi-core architectures. OTOH, traditional thread based
concurrency and lock based exclusions are becoming more and more
difficult to program correctly.
First of all, the question is not whether py is slow or fast but
performance of a system written in py. Which means, ability to leverage
multi-core architectures as well as control. Control in term of things
like ability to pin one process/task to a core, ability to pin one or
more homogeneous tasks to specific cores et al, as well as not wait for
a global lock and similar primitives. (Before anybody jumps into a
conclusion, this is not about GIL by any means ;o))
Second, it is clear that we need a good solution (not THE solution) for
moderately massive parallelism in multi-core architectures (i.e. 8-32
cores). Share nothing might not be optimal; we need some form of memory
sharing, not just copy all data via messages. May be functional
programming based on the blackboard pattern would work, who knows.
I have seen systems saturated still having only ~25% of CPU utilization
(in a 4 core system!). It is because we didn't leverage multi-cores and
parallelism. So while py3k will not be slow, lack of a cohesive
multi-core strategy will show up in system performance and byte us
later(pun intended!).
At least, in my mind, this is not an exercise about exposing locks and
mutexes or threads in Python. I do believe that the GIL will be
refactored to more granularity in the coming months (similar to the
Global Locks in Linux) and most probably we will get microThreads et al.
As we all know, architecture is constraining as well as liberating. The
language primitives influence greatly how we think about a problem.
In the discussions, Guido is right in insisting on speed, and Bruce is
right in asking for language constructs. Without pragmatic speed, folks
won't use it; same is the case without the required constructs. Both are
barriers to adoption. We have an opportunity to offer a solution for
multi-core architectures and let us seize it - we will rush in where
angels fear to tread!
Programming Models
------------------
There are at least 3 possible paradigms
A. conventional threading model
B. Functional model, Erlang being the most appropriate C. Some form of
limited shared memory model (message passing but pass pointers,
blackboard model) D. Others, like Transactional Memory [2]
There is enough literature out there, so do not plan to explain these
here. (<KS> Do we need more explanation? </KS>)
Pragmatic proposal
------------------
May I suggest we embed two primitives in Python 3K:
A) A functional style share-nothing set of interfaces (and
implementations thereof) - provides the task parallelism/concurrency
capability, "small messages, big computations" as Joe Armstrong calls it[3]
B) A limited shared memory based model for data intensive parallelism
Most probably this would be part of stdlib. While Guido is almost right
in saying that this is a (std)library problem, it is not fully so. We
would need a few primitives from the underlying PVM substrate. Possibly
one reason for Guido's position is the lack of clarity as to what needs
to be changed and why. IMHO, just saying take GIL off does not solve the
problem either.
The Zen of Python parallelism
-----------------------------
I draw inspiration for the very timely article by James Reinders in DDJ
[1]. It embodies what we should be doing viz.:
1. Refactor the problem into parallel tasks. We cannot help if the
domain is sequential 2. Program to abstraction & program chores not
cores. Writing correct program using raw threads et al is difficult. Let
the underlying substrate decide how best to optimize 3. Design for scale
4. Have an option to turn concurrency off, for debugging 5. Declarative
parallelism based mechanisms (?)
Related Efforts
---------------
The good news is there are at least 2 or 3 paradigms with
implementations and rough benchmarks.
Parallel python http://www.artima.com/weblogs/viewpost.jsp?thread=214303
http://cheeseshop.python.org/pypi/parallel
Processing http://cheeseshop.python.org/pypi/processing
http://code.google.com/p/papyros/
Discussions
-----------
There are at least four thread sets (pardon the pun !) I am aware of:
1. The GIL discussions in python-dev and Guido's blog on GIL
http://www.artima.com/weblogs/viewpost.jsp?thread=214235
2. The py3k topics started by Bruce
http://www.artima.com/weblogs/viewpost.jsp?thread=214112, response by
Guide http://www.artima.com/weblogs/viewpost.jsp?thread=214325 and reply
to reply by Bruce http://www.artima.com/weblogs/viewpost.jsp?thread=214480
3. Python and concurrency
http://mail.python.org/pipermail/python-ideas/2007-March/000338.html
References
[1]http://www.ddj.com/architect/201804248
[2]Transaction
http://acmqueue.com/modules.php?name=Content&pa=showpage&pid=444
[3]Programming Erlang by Joe Armstrong
More information about the Python-ideas
mailing list