[Python-ideas] Python 3000 TIOBE -3%

Massimo Di Pierro massimo.dipierro at gmail.com
Fri Feb 10 15:52:16 CET 2012

The way I see it is not whether Python has threads, fibers, coroutines, etc. The problem is that in 5 years we going to have on the market CPUs with 100 cores (my phone has 2, my office computer has 8 not counting GPUs). The compiler/interpreters must be able to parallelize tasks using those cores without duplicating the memory space. Erlang may not have threads in the sense that it does not expose threads via an API but provides optional parallel schedulers where coroutines are distributed automatically over the available cores/CPUs (http://erlang.2086793.n4.nabble.com/Some-facts-about-Erlang-and-SMP-td2108770.html). Different languages have different mechanisms for taking advantages of multiple cores without forking. Python does not provide a mechanism and I do not know if anybody is working on one.

In Python, currently, you can only do threading to parallelize your code without duplicating memory space, but performance decreases instead of increasing with number of cores. This means threading is only good for concurrency not for scalability.

The GC vs reference counting (RC) is the hearth of the matter. With RC every time a variable is allocated or deallocated you need to lock the counter because you do know who else is accessing the same variable from another thread. This forces the interpreter to basically serialize the program even if you have threads, cores, coroutines, etc.

Forking is a solution only for simple toy cases and in trivially parallel cases. People use processes to parallelize web serves and task queues where the tasks do not need to talk to each other (except with the parent/master process). If you have 100 cores even with a small 50MB program, in order to parallelize it you go from 50MB to 5GB. Memory and memory access become a major bottle neck.



On Feb 10, 2012, at 3:29 AM, Mark Shannon wrote:

> There are a lot of things covered in this thread.
> I want to address 2 of them.
> 1. Garbage Collection.
> Python has garbage collection. There is no free() function in Python,
> anyone who says that Python does not have GC is talking nonsense.
> CPython using reference counting as its means of implementing GC.
> Ref counting has different performance characteristics from tracing GC,
> but it only makes sense to consider this is the context of overall
> Python performance.
> One key disadvantage of ref-counting is that does not play well with threads, which leads on to...
> 2. Global Interpreter Lock and Threads.
> The GIL is so deeply embedded into CPython that I think it cannot be removed. There are too many subtle assumptions pervading both the VM and 3rd party code, to make truly concurrent threads possible.
> But are threads the way to go?
> Javascript does not have threads. Lua does not have threads.
> Erlang does not have threads; Erlang processes are implemented (in the BEAM engine) as coroutines.
> One of the Lua authors said this about threads:
> (I can't remember the quote so I will paraphrase)
> "How can you program in a language where 'a = a + 1' is not deterministic?"
> Indeed.
> What Python needs are better libraries for concurrent programming based on processes and coroutines.
> Cheers,
> Mark.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20120210/ee557da4/attachment.html>

More information about the Python-ideas mailing list