Even in the Erlang model, the afore-mentioned issues of bus contention put a cap on the number of threads you can run in any given application assuming there's any amount of cross-thread synchronization. I wrote a blog post on this subject with respect to my experience in tuning RabbitMQ on NUMA architectures.<div>
<div><br></div><div><meta http-equiv="content-type" content="text/html; charset=utf-8"><a href="http://blog.agoragames.com/blog/2011/06/24/of-penguins-rabbits-and-buses/">http://blog.agoragames.com/blog/2011/06/24/of-penguins-rabbits-and-buses/</a></div>
<div><br></div><div>It should be noted that Erlang processes are not the same as OS processes. They are more akin to green threads, scheduled on N number of legit OS threads which are in turn run on C number of cores. The end effect is the same though, as the data is effectively shared across NUMA nodes, which runs into basic physical constraints.</div>
<div><br></div><div>I used to think the GIL was a major bottleneck, and though I'm not fond of it, my recent experience has highlighted that *any* application which uses shared memory will have significant bus contention when scaling across all cores. The best course of action is shared-nothing MPI style, but in 64bit land, that can mean significant wasted address space.</div>
<div><br></div><div><a href="http://blog.agoragames.com/blog/2011/06/24/of-penguins-rabbits-and-buses/"></a>-Aaron</div><div><br><br><div class="gmail_quote">On Fri, Aug 12, 2011 at 2:59 PM, Sturla Molden <span dir="ltr"><<a href="mailto:sturla@molden.no">sturla@molden.no</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Den 12.08.2011 18:51, skrev Xavier Morel:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
* Erlang uses "erlang processes", which are very cheap preempted *processes* (no shared memory). There have always been tens to thousands to millions of erlang processes per interpreter source contention within the interpreter going back to pre-SMP by setting the number of schedulers per node to 1 can yield increased overall performances) <br>
</blockquote>
<br>
Technically, one can make threads behave like processes if they don't share memory pages (though they will still share address space). Erlangs use of 'process' instead of 'thread' does not mean an Erlang process has to be implemented as an OS process. With one interpreter per thread, and a malloc that does not let threads share memory pages (one heap per thread), Python could do the same.<br>
<br>
On Windows, there is an API function called HeapAlloc, which lets us allocate memory form a dedicated heap. The common use case is to prevent threads from sharing memory, thus behaving like light-weight processes (except address space is shared). On Unix, is is more common to use fork() to create new processes instead, as processes are more light-weight than on Windows.<br>
<br>
Sturla<br><br></blockquote></div><br>
</div></div>