>Even for problems where it appears trivial, there can be hidden
>issues, like false cache coherency communication where no actual
>sharing is taking place. Or locks that appear to have low contention and
>negligible performance impact on ``only'' 8 processors suddenly turn into
>bottlenecks. Then there is NUMA. A given address in memory may be
>RAM attached to the processor accessing it, or to another processor,
>with very different access costs.

Could what you are saying be summed up by saying, "The more threads
you have the more important it is to keep your threads independent,
sharing as little data as possible."
