threads

Eric Lee Green eric at linux-hw.com
Tue Apr 13 14:05:08 EDT 1999


I am working on a program that has a) a main thread that accepts
incoming socket connections, and b) an array of threads, each of which
sucks on a Queue to get its socket/address pair and which sets a flag
and goes back to sucking when it's finished processing. I.e., a
permenant set of threads each sucking on its own queue waiting for
incoming connections, rather than the temporary threads supported by
the default SocketServer class. (I do superclass the SocketServer
class though -- __init__ and handle_request were the only two methods
I had to override!  Though I added a couple more methods to deal with
allocating and starting threads).

The only thing that is worrying me is the state of the Python threads
implementation under Linux libc6 (GNU libc). I keep having nightmarish
thoughts about what happens if the process scheduler kicks a thread out
and another one in halfway through Python's internal 'guts' updating an
object in the internal object depository, thereby corrupting things
badly. 

So: 
  1) do I have to set a semaphor on ALL data structures that are
     altered? For example, a slave thread sets a flag in an array to let
     the master know that it's available, then goes and starts sucking
     on the Queue for its next job. When the master sends it a new job
     on the Queue, the master clears the flag first so that it knows that
     this thread is no longer available (until the thread sets the flag
     later signalling otherwise, right?). What happens if the process
     scheduler kicks in halfway through setting that flag?! Do I get
     data corruption? (In "C" all that happens is that the master thread
     doesn't see the flag being set until the next time through, no big
     deal). 
      
     My question is this: How "atomic" are basic Python variable operations? 
     I understand about protecting transactions upon variables with semaphors
     (thus if I want to add 5 to a counter accessible from other threads, 
     I want to semaphor it so that another thread adding 5 to that counter
     can't "lose" its addition due to a change of context). 
     The question is whether the Python object depository is protected. 

 2) Any other "gotchas" that I need to be worrying about? Other than the
    obvious ones that go with any multi-threaded program in any language?

I must say that this program is turning out to be a lot easier than I
expected, other than my concerns above. The core classes take up
barely two pages of printout!  I'd still be working on the first class
if I'd tried doing it in C++. 

The most interesting thing is that this looks like it will turn out to
be the fastest way of doing this kind of server on Linux. The aio_
(asynchronous I/O) approach doesn't have as great of performance on
Linux (since Linux does not yet support the aio_ system calls, the
libc6 library spawns off an entire thread for each asynchronous
operation -- and spawning threads is a lot more time consuming than
sending a message to an already-existing thread). Not to mention that
since I'm not serving files with this server, the aio_ approach
wouldn't work anyhow. The only real problems are the same ones that
face Medusa on Linux -- the fact that Linux, like most Unixes, has a
low per-process file handle limit.

--
Eric Lee Green         eric at linux-hw.com     http://www.linux-hw.com/~eric
 Defend the 2nd Amendment. Fight for the right to arm bears. 




More information about the Python-list mailing list