[Python-ideas] Python 3000 TIOBE -3%

Stefan Behnel stefan_ml at behnel.de
Fri Feb 10 16:28:11 CET 2012


Massimo Di Pierro, 10.02.2012 15:52:
> Different languages have different mechanisms for taking advantages of
> multiple cores without forking. Python does not provide a mechanism and
> I do not know if anybody is working on one.

Seriously - what's wrong with forking? multiprocessing is so increadibly
easy to use that it's hard for me to understand why anyone would fight for
getting threading to do essentially the same thing, just less safe.

Threading is a seriously hard problem, very tricky to get right and full of
land mines. Basically, you start from a field that's covered with one big
mine, and start cutting it down until you can get yourself convinced that
the remaining mines (if any, right?) are small enough to not hurt anyone.
They usually do anyway, but at least not right away.

This is generally worth a read (not necessarily for the conclusion, but
definitely for the problem description):

http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf


> In Python, currently, you can only do threading to parallelize your code
> without duplicating memory space, but performance decreases instead of
> increasing with number of cores.

Well, nothing keeps you from putting your data into shared memory if you
use multiple processes. It's not that hard either, but it has the major
advantage over threading that you can choose exactly what data should be
shared, so that you can more easily avoid race conditions and unintended
interdependencies.

Basically, you start from a safe split and then add explicit data sharing
and messaging until you have enough shared data and synchronisation points
to make it work, while still keeping up a safe and efficient concurrent
system. Note how this is the opposite of threading, where you start off
from the maximum possible unsafety where all state is shared, and then wade
through it with a machete trying to cut down unsafe interaction points. And
if you miss any one spot, you have a problem.


> This means threading is only good for
> concurrency not for scalability.

Yes, concurrency, or more specifically, I/O concurrency is still a valid
use case for threading.


> The GC vs reference counting (RC) is the hearth of the matter. With RC
> every time a variable is allocated or deallocated you need to lock the
> counter because you do know who else is accessing the same variable from
> another thread. This forces the interpreter to basically serialize the
> program even if you have threads, cores, coroutines, etc.
> 
> Forking is a solution only for simple toy cases and in trivially
> parallel cases. People use processes to parallelize web serves and task
> queues where the tasks do not need to talk to each other (except with
> the parent/master process). If you have 100 cores even with a small 50MB
> program, in order to parallelize it you go from 50MB to 5GB. Memory and
> memory access become a major bottle neck.

I think you should read up a bit on the various mechanisms for parallel
processing.

Stefan




More information about the Python-ideas mailing list