Parallelization on muli-CPU hardware?

Alex Martelli aleaxit at yahoo.com
Tue Oct 12 15:54:09 CEST 2004


Nicolas Lehuen <nicolas.lehuen at thecrmcompany.com> wrote:

> "Alex Martelli" <aleaxit at yahoo.com> a écrit dans le message de
news:1gljja1.1nxj82c1a25c1bN%aleaxit at yahoo.com...
> > Nicolas Lehuen <nicolas at lehuen.com> wrote:
> > 
> > > problem was the same ; people expected mod_python to run in a
> > > multi-process context, not a multi-threaded context (I guess this is
> > > due to a Linux-centered mindset, forgetting about BSD, MacosX or Win32
> > > OSes). When I asked questions and pointed problem, the answer was 'Duh
> > > - use Linux with the forking MPM, not Windows with the threading MPM'.
> > 
> > Sorry, I don't get your point.  Sure, Windows makes process creation
> > hideously expensive and has no forking.  But all kinds of BSD, including
> > MacOSX, are just great at forking.  Why is a preference for multiple
> > processes over threads "forgetting about BSD, MacOSX", or any other
> > flavour of Unix for that matter?
> 
> Because when you have multithreaded programs, you can easily share objects
between different threads, provided you carefully implement them. On the
web framework I wrote, this means sharing and reusing the same DB
connection pool, template cache, other caches and so on. This means a
reduced memory footprint and increased performance. In a multi-process
environment, you have to instantiate as many connections, caches,
templates etc. that you have processes. This is a waste of time and
memory.

I'm not particularly interested in debating the pros and cons of threads
vs processes, right now, but in getting a clarification of your original
assertion which I _still_ don't get.  It's still quoted up there, so
please DO clarify: how would "a Linux-centered mindset forgetting about
BSD" (and MacOSX is a BSD in all that matters in this context, I'd say)
bias one against multi-threading or towards multi-processing?  In what
ways are you claiming that Unix-like systems with BSD legacy are
inferior in multi-processing, or superior in multi-threading, to ones
with Linux kernels?  I think I understand both families decently well
and yet I _still_ don't understand whence this claim is coming.  (Forget
the red herring of Win32 -- nobody's disputing that _their_ process
spawning is a horror of inefficiency -- let's focus about Unixen, since
you did see fit to mention them so explicitly contrasted, hm?!).

_Then_, once that point is cleared, we may (e.g.) debate how the newest
Linux VM development (presumably coming in 2.6.9) may make mmap quite as
fast as sharing memory among threads (which, I gather from hearsay only,
is essentially the case today already... but _only_ for machines with no
more than 2GB, 3GB tops of physical memory being so shared -- the claim
is that the newest developments will let you have upwards of 256 GB of
physical memory shares with similar efficiency, by doing away with the
current pagetables overhead of mmap), and what (if anything) is there in
BSD-ish kernels to match that accomplishments.  But until you clarify
that (to me) strange and confusing assertion, to help me understand what
point you were making there (and nothing in this "answer" of yours is at
all addressing my doubts and my very specific question about it!), I see
little point in trying to delve into such exoterica.


> BTW [shameless plug] here is the cookbook recipe I wrote about thread-safe
caching.
> 
> http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/302997

I do not see any relevance of that recipe (with which I'm quite
familiar, since I'm preparing the 2nd Edition of the Cookbook) to your
assertion with Linux on one side, and BSD derivatives grouped with
Windows on the other, quoted, questioned and never clarified above.


> > > Well, this is not going to take us anywhere, especially with all the
> > > multicore CPUs coming.
> > 
> > Again, I don't get it.  Why would multicore CPUs be any worse than
> > current multi-CPU machines at multiple processes, and forking?
> 
> Obviously they won't be any worse. Well, to be precise, it still depends
on the OS, because the scheduler must know the difference between 2
processors and a 2-core processor to efficiently balance the work, but
anyway.

As far as I know there is no special support in current kernels for
multi-core CPUs as differentiated from multiple CPUs sharing external
buses... (nor even yet, unless I'm mistaken, for such fundamental
novelties as HyperTransport, aka DirectConnect, which _has_ been around
for quite a while now -- and points to a completely different paradigm
than shared memory as being potentially much-faster IPC... bandwidths of
over 10 gigabytes/second, which poor overworked memory subsystems might
well have some trouble reaching... how one would exploit hypertransport
within multiple threads of a process, programmed on the basis of sharing
memory, I dunno -- using it between separate processes which exchange
messages appears conceptually simpler).  Anyway, again to the limited
amount of my current knowledge, this holds just as much for multiple
threads as for multiple processes, no?

 
> What I meant is that right now I'm writing this on a desktop PC with
hyperthreading. This means that even on a desktop PC you can benefit
from having multithreaded (or multi-processed) applications.

I'm writing this on my laptop (uniprocessor, no quirks), because I'm on
a trip, but at home I do have a dual-processor desktop (actually a
minitower, but many powerful 'desktops' are that way), and it's a
year-old model (and Apple was making dual processors for years before
the one I own, though with 32-bit 'G4' chips rather than 64-bit 'G5'
ones they're using now).  So this is hardly news: I can run make on a
substantial software system much faster with a -j switch to let it spawn
multiple jobs (processes).

> With multicore or multiprocessor machines being more and more current, the
pressure to have proper threading support in Python will grow and grow.

The pressure has been growing for a while and I concur it will keep
growing, particularly since the OS by far most widespread on desktops
has such horrible features for multiple-process spawning and control.
But, again, before we go on to debate this, I would really appreciate it
if you clarified your previous assertions, to help me understand why you
believe that BSD derivatives, including Mac OS X, are to be grouped on
the same side as Windows, while only Linux would favour processes over
threads -- when, to _me_, it seems so obvious that the reasonable
grouping is with all Unix-like systems on one side, Win on the other.
You either know something I don't, about the internals of these systems,
and it appears so obvious to you that you're not even explaining it now
that I have so specifically requested you to explain; or there is
something else going on that I really do not understand.


Alex



More information about the Python-list mailing list