[Python-Dev] Thoughts fresh after EuroPython

Tue Jul 27 14:00:26 CEST 2010

On Mon, Jul 26, 2010 at 12:00 PM, Michael Foord
<fuzzyman at voidspace.org.uk> wrote:
> At Resolver Systems we created a "calculation system" that does large
> calculations on background threads using IronPython. Doing them on a
> background thread allows the ui to remain responsive. Several calculations
> could run simultaneously using multiple cores.
>
> As the calculation operates on a large object graph (which the ui then needs
> access to in order to display it) using multiprocessing would have imposed a
> very big overhead due to serialization / deserialization (the program runs
> on windows).
[...]
> All the best,
>
> Michael

Hey,

(De)serialization being a much bigger cost than cache invalidation for
a low amount of threads that do a lot of work each is definitely a
common "problem" (in quotes, because as you mentioned: it actually
*works*!). There are a number of ways that CSP tries to solve that
(generally involving more locking), but they are not currently
applicable to CPython because of the state of the GIL. Unfortunately,
CSP theory appears to predict this is something that starts breaking
down around 16 or so cores. Since x86-64 CPUs (Opterons) are currently
available with 12 cores and their 16 core bigger brother coming in
2011, I guess now would be a good time to start worrying about it :-)

I'd like to chime in from my experience with E, because they've ran
into this problem (processors want many processes to perform, but
(de)serialization makes that prohibitive) and tried to solve it (and I
think they did well). As always when I talk about E, I'm not
suggesting everyone drops everything and does this, but it might be
interesting to look at.

(Disclaimer: the following explanations makes minor concessions to
pedant-proof levels of purity in the interest of giving everyone an
idea of what something is that's correct enough to reason about it on
an abstract level -- people who are interested, please read the
Wikipedia bits, they're surprisingly good :-))

E introduces a concept called "vats". They have an event queue, their
own stack and N objects. Vats run on top of real processes, which have
0..N vats. The advantage is that vats don't share namespaces but can
(but don't necessarily do) share memoryspaces. So, messaging between
vats *can* be cheap (I'm unfamiliar with threading under .NET, but if
it's similar to how it happens on the JVM: same ballpark), but the vat
is completely oblivious to the fact it's running on the same process
as a different vat or a completely different one running on a CPU on
the other side of the world.

Because inter-vat message passing is explicit, these vats can also run
in parallel with no issues. The simplest way to implement this would
be a vat-local GIL (I realise the name GIL no longer makes sense
there) for each vat, and then the process (most likely written in
C(ython)) and the objects inside each vat contesting it.

Or, in closing, but less exciting sounding: we've reinvented threads
and they're called vats now! (The advantage is that you get the
distributed nature, and only pay for it when you actually need it.)

Computers are reasonably good at this sort of scheduling (putting the
appropriate vats together), but it wouldn't be unthinkable to have the
programmer hint at it. You just have to be careful not to take it too
far and get into gcc realm, where higher levels of optimization
include things like "ignore programmer hints".

Caveat emptor: E has always cared much more about capabilities (so the
security aspect) than parallel execution.

thanks for reading
Laurens