[Python-Dev] microthreading vs. async io

Mon Feb 26 10:50:27 CET 2007

Armin Rigo wrote:
> I just realized that this is not really true in the present context.
> If the goal is to support programs that "look like" they are
> multi-threaded, i.e. don't use callbacks, as I think is Joachim's goal,
> then most of the time the wait() function would be only called with a
> *single* event, rarely two or three, never more.  Indeed, in this model
> a large server is implemented with many microthreads: at least one per
> client.  Each of them blocks in a separate call to wait().  In each such
> call, only the events revelant to that client are mentioned.
>   
Yes exactly.

> In other words, the cost is O(n), but n is typically 1 or 2.  It is not
> the total number of events that the whole application is currently
> waiting on.  Indeed, the scheduler code doing the real OS call (e.g. to
> select()) can collect the events in internal dictionaries, or in Poll
> objects, or whatever, and update these dictionaries or Poll objects with
> the 1 or 2 new events that a call to wait() introduces.  In this
> respect, the act of *calling* wait() already means "add these events to
> the set of all events that need waiting for", without the need for a
> separate API for doing that.
>   
But as I'd like to make the event structure similar to the BSD-kevent 
structure, we could use
a flag in the event structure that tells the schedular to consider it 
only once or keep it in its
dictionary, than the task would not need to supply the event each time.

> [I have experimented myself with a greenlet-based system giving wrapper
> functions for os.read()/write() and socket.recv()/send(), and in this
> style of code we tend to simply spawn new greenlets all the time.  Each
> one looks like an infinite loop doing a single simple job: read some
> data, process it, write the result somewhere else, start again.  (The
> loops are not really infinite; e.g. if sockets are closed, an exception
> is generated, and it causes the greenlet to exit.)  So far I've managed
> to always wait on a *single* event in each greenlet, but sometimes it
> was a bit contrieved and being able to wait on 2-3 events would be
> handy.]
>   
I do not spawn new greenlets all the time. Instead, my tasks either 
wait(....) or call wrappers
for read/write/send/recv... that implicitely call wait(...) until enough 
data is available, and the
wait(...) does the yield to the scheduler that can either continue other 
tasks or call kevent/poll/select
if no task is runnable.

What I'd like to see in an API/library:

* a standard schedular that is easily extensible
* event structure/class  that's  easily extensible

E.g. I've extended the kevent structure for the scheduler to also 
include channels similar to
stackless. These are python only communication structures, so there is 
no OS support
for blocking on them, but the scheduler can decide if there is something 
available for a task
that waits on a channel, so the channels are checked first in the 
schedular to see if a task
can continue and only if no channel event is available the schedular 
calls kevent/select/poll.

While the scheluler blocks in kevent/select/poll, nothing happens on the 
channels as no
task is running, so the scheduler never blocks (inside the OS) 
unnecessarily.

Joachim