Cpython optimization

Olof Bjarnason olof.bjarnason at gmail.com
Fri Oct 23 09:45:06 CEST 2009


2009/10/22 MRAB <python at mrabarnett.plus.com>

> Olof Bjarnason wrote:
> [snip]
>
>  A short question after having read through most of this thread, on the
>> same subject (time-optimizing CPython):
>>
>> http://mail.python.org/pipermail/python-list/2007-September/098964.html
>>
>> We are experiencing multi-core processor kernels more and more these days.
>> But they are all still connected to the main memory, right?
>>
>> To me that means, even though some algorithm can be split up into several
>> threads that run on different cores of the processor, that any algorithm
>> will be memory-speed limited. And memory access is a quite common operation
>> for most algorithms.
>>
>> Then one could ask oneself: what is the point of multiple cores, if memory
>> bandwidth is the bottleneck? Specifically, what makes one expect any speed
>> gain from parallelizing a sequential algorithm into four threads, say, when
>> the memory shuffling is the same speed in both scenarios? (Assuming memory
>> access is much slower than ADDs, JMPs and such instructions - a quite safe
>> assumption I presume)
>>
>> [ If every core had it's own primary memory, the situation would be
>> different. It would be more like the situation in a distributed/internet
>> based system, spread over several computers. One could view each core as a
>> separate computer actually ]
>>
>>  Don't forget about the on-chip cache! :-)
>

Sorry for continuing slightly OT:

Yes, that makes matters even more interesting.

Caches for single-cpu-boards speed up memory access quite dramatically. Are
caches for multi-core boards shared among the cores? Or do each core have a
separate cache? I can only imagine how complicated the read/write logic must
be of these tiny electronic devices, in any case.

Of course caches makes the memory access-operations must faster, but I'm
guessing register instructions are still orders of magnitude faster than
(cached) memory access. (or else registers would not really be needed - you
could just view the whole primary memory as an array of registers!)

So I think my first question is still interesting: What is the point of
multiple cores, if memory is the bottleneck?
(it helps to think of algorithms such as line-drawing or ray-tracing, which
is easy to parallellize, yet I believe are still faster using a single core
instead of multiple because of the read/write-to-memory-bottleneck. It does
help to bring more workers to the mine if only one is allowed access at a
time, or more likely, several are allowed yet it gets so crowded that
queues/waiting is inevitable)

-- 

> http://mail.python.org/mailman/listinfo/python-list
>



-- 
twitter.com/olofb
olofb.wordpress.com
olofb.wordpress.com/tag/english
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20091023/6e2c41aa/attachment.html>


More information about the Python-list mailing list