multi-core software

Scott David Daniels Scott.Daniels at Acm.Org
Sun Jun 7 08:18:48 EDT 2009

Paul Rubin wrote:
> "Jeff M." <massung at> writes:
>>>> Even the lightest weight
>>>> user space ("green") threads need a few hundred instructions, minimum,
>>>> to amortize the cost of context switching....
>> There's always a context switch. It's just whether or not you are
>> switching in/out a virtual stack and registers for the context or the
>> hardware stack/registers.
> I don't see the hundreds of instructions in that case.  
> shows GHC doing 50 million lightweight thread switches in 8.47
> seconds, passing a token around a thread ring.  Almost all of that is
> probably spent acquiring and releasing the token's lock as the token
> is passed from one thread to another.  That simply doesn't leave time
> for hundreds of instructions per switch.
Remember, he said, "to amortize the cost of the context switch," not to
perform the switch itself. I stutter-stepped there myself.  One not
completely obvious place is in blowing the instruction and likely the
data) cache.  I suspect it is tough to measure when that cost applies,
but I expect he's likely right, except on artificial benchmarks, and
the nub of the problem is not on the benchmarks.  There is something
to be said for the good old daays when you looked up the instruction
timings that you used in a little document for your machine, and could
know the cost of any loop.  We are faster now, but part of the cost of
that speed is that timing is a black art.  Perhaps modern operating
systems need the syyem call that was implemented in the Stanfor AI Lab's
operating system -- phase of the moon.

--Scott David Daniels
Scott.Daniels at Acm.Org

More information about the Python-list mailing list