Parallelization on muli-CPU hardware?
alanmk at hotmail.com
Wed Oct 6 13:38:17 CEST 2004
> So basically you either get a really huge number of locks (one per
> object) with enough potential for conflicts, deadlocks and all the
> other stuff to make it real slow down the ceval.
> One could use less granularity, and lock say the class of the object
> involved, but that wouldn't help that much either.
> So basically the GIL is a design decision that makes sense, perhaps it
> shouldn't be just called the GIL, call it the "very large locking
> granularity design decision".
Reading the above, one might be tempted to conclude that presence of the
GIL in the cpython VM is actually a benefit, and perhaps that all
language interpreters should have one!
But the reality is that few other VMs have selected to employ such a
"very large locking granularity design decision". If one were to believe
the arguments about the problems of fine-grained locking, one might be
tempted to conclude that other VMs such as the JVM and the CLR are
incapable of competing with the cpython VM, in terms of performance. But
Jim Hugunin's pre-release figures for IronPython performance indicate
that it can, in some cases, outperform cpython while running on just a
single processor, let alone when multiple processors are available. And
that is for an "interpreter running on an interpreter".
CPython's GIL does give a *small* performance benefit, but only when
there is a single execution pipeline. Once there is more than one
pipeline, it degrades performance.
As I've already stated, I believe the benefits and trade-offs of the GIL
are arguable either way when there is a small number of processors
involved, e.g. 2 or less. But if chip makers are already producing chips
with 2 execution pipelines, then you can be sure it won't too be long
before they are shipping units with 4, 8, 16, 32, etc, execution
pipelines. As this number increases, the GIL will increasingly become a
Contrarily, jython, ironpython, etc, will continue to benefit from the
enormous and massively-resourced optimisation efforts going into the JVM
and CLR respectively (e.g. JIT compilation), all without a single change
to the python code or the code interpreting it.
Lastly, as the number of execution pipelines in CPUs grow, what will
happen if/when they start talking back and forth to each,
transputer-style, instead of executing mostly isolated and in parallel
as they do today? Transputers had the lovely concept of high-speed
hardware communication channels cross-linking all CPUs in the "array".
The reason this model never really took off back in the 80's was because
there were no familiar high-level language models for exploiting it,
IMHO, new python concepts such as generators are precisely the right
high-level concepts for enabling transputer style fine-granularity,
inter-execution-pipeline, on-demand, "pull" comms. This would put python
at an enormous conceptual advantage compared to other mainstream
languages, which generally don't have generator-style concepts. What a
shame that the GIL would restrict only one such CPU at a time to
actually be running python code.
email alan: http://xhaus.com/contact/alan
 Transputer architecture
 Occam programming language
More information about the Python-list