[Python-Dev] microthreading vs. async io

Sun Feb 25 18:18:46 CET 2007

Hi Adam,

On Thu, Feb 15, 2007 at 06:17:03AM -0700, Adam Olsen wrote:
> > E.g. have a wait(events = [], timeout = -1) method would be sufficient
> > for most cases, where an event would specify
> 
> I agree with everything except this.  A simple function call would
> have O(n) cost, thus being unacceptable for servers with many open
> connections.  Instead you need it to maintain a set of events and let
> you add or remove from that set as needed.

I just realized that this is not really true in the present context.
If the goal is to support programs that "look like" they are
multi-threaded, i.e. don't use callbacks, as I think is Joachim's goal,
then most of the time the wait() function would be only called with a
*single* event, rarely two or three, never more.  Indeed, in this model
a large server is implemented with many microthreads: at least one per
client.  Each of them blocks in a separate call to wait().  In each such
call, only the events revelant to that client are mentioned.

In other words, the cost is O(n), but n is typically 1 or 2.  It is not
the total number of events that the whole application is currently
waiting on.  Indeed, the scheduler code doing the real OS call (e.g. to
select()) can collect the events in internal dictionaries, or in Poll
objects, or whatever, and update these dictionaries or Poll objects with
the 1 or 2 new events that a call to wait() introduces.  In this
respect, the act of *calling* wait() already means "add these events to
the set of all events that need waiting for", without the need for a
separate API for doing that.

[Actually, I think that the simplicity of the wait(events=[]) interface
over any add/remove/callback APIs is an argument in favor of the
"microthread-looking" approach in general, though I know that it's a
very subjective topic.]

[I have experimented myself with a greenlet-based system giving wrapper
functions for os.read()/write() and socket.recv()/send(), and in this
style of code we tend to simply spawn new greenlets all the time.  Each
one looks like an infinite loop doing a single simple job: read some
data, process it, write the result somewhere else, start again.  (The
loops are not really infinite; e.g. if sockets are closed, an exception
is generated, and it causes the greenlet to exit.)  So far I've managed
to always wait on a *single* event in each greenlet, but sometimes it
was a bit contrieved and being able to wait on 2-3 events would be
handy.]

A bientot,

Armin.