olof.bjarnason at gmail.com
Fri Oct 23 09:45:06 CEST 2009
2009/10/22 MRAB <python at mrabarnett.plus.com>
> Olof Bjarnason wrote:
> A short question after having read through most of this thread, on the
>> same subject (time-optimizing CPython):
>> We are experiencing multi-core processor kernels more and more these days.
>> But they are all still connected to the main memory, right?
>> To me that means, even though some algorithm can be split up into several
>> threads that run on different cores of the processor, that any algorithm
>> will be memory-speed limited. And memory access is a quite common operation
>> for most algorithms.
>> Then one could ask oneself: what is the point of multiple cores, if memory
>> bandwidth is the bottleneck? Specifically, what makes one expect any speed
>> gain from parallelizing a sequential algorithm into four threads, say, when
>> the memory shuffling is the same speed in both scenarios? (Assuming memory
>> access is much slower than ADDs, JMPs and such instructions - a quite safe
>> assumption I presume)
>> [ If every core had it's own primary memory, the situation would be
>> different. It would be more like the situation in a distributed/internet
>> based system, spread over several computers. One could view each core as a
>> separate computer actually ]
>> Don't forget about the on-chip cache! :-)
Sorry for continuing slightly OT:
Yes, that makes matters even more interesting.
Caches for single-cpu-boards speed up memory access quite dramatically. Are
caches for multi-core boards shared among the cores? Or do each core have a
separate cache? I can only imagine how complicated the read/write logic must
be of these tiny electronic devices, in any case.
Of course caches makes the memory access-operations must faster, but I'm
guessing register instructions are still orders of magnitude faster than
(cached) memory access. (or else registers would not really be needed - you
could just view the whole primary memory as an array of registers!)
So I think my first question is still interesting: What is the point of
multiple cores, if memory is the bottleneck?
(it helps to think of algorithms such as line-drawing or ray-tracing, which
is easy to parallellize, yet I believe are still faster using a single core
instead of multiple because of the read/write-to-memory-bottleneck. It does
help to bring more workers to the mine if only one is allowed access at a
time, or more likely, several are allowed yet it gets so crowded that
queues/waiting is inevitable)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-list