Python threading?

Tue Sep 24 03:33:00 EDT 2002

Mark Hammond wrote:
        ...
> I would have thought that a brave call.  If you are performing lots of
> IO and need many many connections, you are almost certainly better off
> using an "asynchronous" approach, much like asyncore.  With a large

Yes BUT -- as of course you know, select, on Windows, can only deal with
sockets.  Other kinds of asynchronous I/O require a completely different
architecture on Windows wrt Unixoid systems -- file I/O, specifically,
has the best performance with Windows' on asynchronous-IO-with-callbacks-
on-completion, an arrangement very reminiscent of old VMS systems.  Even
for sockets, I benchmarked back in NT 3.something times that "native"
(async-with-messages) winsock was vastly more scalable than "BSD emulating
sockets and select" (not sure if that's still true these days).

twisted.internet does have multiple implementations of the Reactor
design patterns, including among others both a portable one based
on select, and a Windows-specific one with a specialized message loop
at its core (and others yet, such as ones working with GUI frameworks'
events, poll in lieu of select, etc) -- haven't examined implementation
in detail, but that does seem like the only sane approach if you need
servers that are BOTH cross-platform AND highly scalable.

> number of threads (where "large" can be as low as single figures) the OS
> overhead of switching will be greater than your internal overhead of a
> more complicated design.

I guess so, depending of course on the OS -- threading is more and
more likely to be well-optimized these days, but still the context
switch has to cost something compared to a "cooperative" asynchronous
architecture you can implement within one thread (or a very few
threads of your own -- twisted lets you do that, too).

> A "one thread per connection" model is doomed from the start -

Hear, hear.

> therefore, in most applications, I would consider 5 threads a large
> number - particularly when all are actively doing something.  OTOH, if
> you are willing to accept that your first version will accept a max of
> ~20 connections, the brain-power saved by using threads beats the
> additional OS overhead by a mile ;)

I think it only LOOKS simple if you don't look into it deeply enough.
If the threads have almost no need to cooperate or share resources,
then you may well be right.  If the cooperation needs are limited
enough that you can smoothly deal with them with Python "Queue"
instances, you still have a chance to be right.  But generally, I
find that Ousterhout was right -- I'm sure you're familiar with his
talk on "Why threads are a bad idea for most purposes" (I guess one
can still find the slides on the net)... his point is more about
"brain-power saved" (or consumed by debugging in a just-too-hard
environment, with transient errors, race conditions, etc, etc) by
using event-driven (i.e., async) approaches rather than threading,
rather than about performance and scalability.

Python may not yet support event-driven programming quite as smoothly
and seamlessly as Ousterhout's Tcl, but, thanks to such developments
as twisted, I think we're drawing progressively closer.  With a solid
enough underlying framework to dispatch events for you, I think event
driven programming (perhaps with a few simple specialized threads off
to one side, interfacing to some slow external resource, and getting
work requests off a Queue, posting results back to a Queue...) can
be made conceptually simpler as well as more scalable than threading.
After all, in event-driven / async programming, ONE thing is happening
at a time -- you only "switch" at known points, and between two such
points you need not worry about what is or isn't atomic, locking, &c.
It's also easier to debug, I think...

Alex