On Mon, Jul 26, 2010 at 12:00 PM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
At Resolver Systems we created a "calculation system" that does large calculations on background threads using IronPython. Doing them on a background thread allows the ui to remain responsive. Several calculations could run simultaneously using multiple cores.
As the calculation operates on a large object graph (which the ui then needs access to in order to display it) using multiprocessing would have imposed a very big overhead due to serialization / deserialization (the program runs on windows). [...] All the best,
Michael
Hey, (De)serialization being a much bigger cost than cache invalidation for a low amount of threads that do a lot of work each is definitely a common "problem" (in quotes, because as you mentioned: it actually *works*!). There are a number of ways that CSP tries to solve that (generally involving more locking), but they are not currently applicable to CPython because of the state of the GIL. Unfortunately, CSP theory appears to predict this is something that starts breaking down around 16 or so cores. Since x86-64 CPUs (Opterons) are currently available with 12 cores and their 16 core bigger brother coming in 2011, I guess now would be a good time to start worrying about it :-) I'd like to chime in from my experience with E, because they've ran into this problem (processors want many processes to perform, but (de)serialization makes that prohibitive) and tried to solve it (and I think they did well). As always when I talk about E, I'm not suggesting everyone drops everything and does this, but it might be interesting to look at. (Disclaimer: the following explanations makes minor concessions to pedant-proof levels of purity in the interest of giving everyone an idea of what something is that's correct enough to reason about it on an abstract level -- people who are interested, please read the Wikipedia bits, they're surprisingly good :-)) E introduces a concept called "vats". They have an event queue, their own stack and N objects. Vats run on top of real processes, which have 0..N vats. The advantage is that vats don't share namespaces but can (but don't necessarily do) share memoryspaces. So, messaging between vats *can* be cheap (I'm unfamiliar with threading under .NET, but if it's similar to how it happens on the JVM: same ballpark), but the vat is completely oblivious to the fact it's running on the same process as a different vat or a completely different one running on a CPU on the other side of the world. Because inter-vat message passing is explicit, these vats can also run in parallel with no issues. The simplest way to implement this would be a vat-local GIL (I realise the name GIL no longer makes sense there) for each vat, and then the process (most likely written in C(ython)) and the objects inside each vat contesting it. Or, in closing, but less exciting sounding: we've reinvented threads and they're called vats now! (The advantage is that you get the distributed nature, and only pay for it when you actually need it.) Computers are reasonably good at this sort of scheduling (putting the appropriate vats together), but it wouldn't be unthinkable to have the programmer hint at it. You just have to be careful not to take it too far and get into gcc realm, where higher levels of optimization include things like "ignore programmer hints". Caveat emptor: E has always cared much more about capabilities (so the security aspect) than parallel execution. thanks for reading Laurens