multi-core software

Sun Jun 7 11:16:46 EDT 2009

Scott David Daniels wrote:
> the nub of the problem is not on the benchmarks.  There is something
> to be said for the good old daays when you looked up the instruction
> timings that you used in a little document for your machine, and could
> know the cost of any loop.  We are faster now, but part of the cost of
> that speed is that timing is a black art.  

Those good old days never existed.  Those manuals never accounted for things 
that affected timing even then, like memory latency or refresh time.  SRAM 
cache made things worse, since the published timings never mentioned 
cache-miss delays.  Though memory cache might seem a recent innovation, it's 
been around a while.  It would be challenging to find any published timing 
since the commercialization of computers that would actually tell the cost of 
any loop.

Things got worse when chips like the '86 family acquired multiple instructions 
for doing loops, still worse when pre-fetch pipelines became deeper and wider, 
absolutely Dark Art due to multi-level memory caches becoming universal, and 
throw-your-hands-up-and-leave-for-the-corner-bar with multiprocessor NUMA 
systems.  OSes and high-level languages complicate the matter - you never know 
how much time slice you'll get or how your source got compiled or optimized by 
run-time.

So the good old days are a matter of degree and self-deception - it was easier 
to fool ourselves then that we could at least guess timings proportionately if 
not absolutely, but things definitely get more unpredictable over evolution.

-- 
Lew