[Python-Dev] uthread strawman

Thu, 9 Nov 2000 12:28:41 -0800

Christian Tismer wrote:
> Does anybody know of a useful example where continuations
> are really needed? 

Well, I don't know about needed, but let me explain a possible sever
architecture, and then see what new fangled control structure could help it
become more manageable. (irrespective if the perf #s of such an architecture
would actually be worth the architecture overhead in current CPython)

In a high performance server, any non-CPU intensive operation which blocks
the thread you're on reduces your scalability potential.

A fairly common way of writing multi-threaded servers is to have one client
per thread , whether this is either a thread pool, or just a simplistic
"Create thread, execute work, end thread" approach.

Threads are very expensive, and increases the context switch penalty your
server is inflicted with.

An alternative which reduces the context switch penalty dramatically is to
use a thread safe work item queue, and N threads where N is usually some
small multiple of the # of CPUs, and is < than the thread pool # of threads.
The solution to avoid these threads from blocking is an asynchronous state
machine. You asynchronously start the long-time operation, and on completion
of this operation insert the request state back in the thread safe queue.

You can further extend this idea to reduce the # of threads that you have so
that you have only 1 thread/CPU. Each thread being bound directly to the
CPU, and not allowed to run on other CPUs.
This tries to prevent the threads from CPU switching and ruining the CPU
cache, etc.
A work item must be able to be executed on any of these threads.

Another extension of this idea is to bundle these work items into separate
queues based on the state in the state machine.
The reason for doing this is trying to prevent unnecessary CPU cache
flushing.

The downside of this approach is that asynchronous state machines are a pain
to debug, maintain, understand, write, etc...
(BTW for the curious, the above architecture does allow you to achieve much
higher perf for certain tasks than other ways of handling the code, it's
been tested and used extensively in some internal C/C++ code (not mine))

The thought occurs to me that continuations would definitely help in this
situation. 
* You'd have more debugging state
* The code is organized around other needs besides boundaries between
blocking operations.

But it's not clear to me (mostly because I haven't applied a lot of thought
about it) if coroutines would suffice here.

Thoughts?

Bill