Sockets, threads, and killing servers

Thu Jan 10 15:39:03 EST 2002

On Wed, 9 Jan 2002 17:20:50 -0500, Joshua Muskovitz <joshm at taconic.net> wrote:
>> 1) Subclass the ThreadingTCPServer class so that instead of calling
>>    socket.accept() and blocking until a connection comes in, it will
>>    call select.select() with a timeout; at the end of the timeout, it
>>    will check a "Quit now?" flag and if the flag is set, will close the
>>    connection and exit. If the quit flag is not set, go back into the
>>    select.select() for another N seconds.
>
>This is very similar to the solution we developed at my last job.  It worked
>well, but it means that there is a non-deterministic delay between setting
>the "self destruct" flag, and the server actually stopping.  So long as this
>doesn't bother you, it should work well.
>
>For us, it was a concern in that the service was supposed to be highly
>reliable, so it needed to be restarted quickly in case it crashed.  But
>there was a small race condition between the shutdown of the old process and
>the startup of the new, especially because of contention for owning the
>server's TCP port.

I wonder if this could be solved by the alternate implementation I
considered and hinted at in one comment in my overridden get_request()
method:

.
.
.
            # Select with a timeout, then poll a quit flag. An alternate
            # approach would be to let the master thread "wake us up" with
            # a socket connection.
.
.
.

What I was talking about was that instead of overriding get_request() to
use select.select() for polling, one could do something like this:

.
.
.
    def get_request(self):
        request = SocketServer.ThreadingTCPServer.get_request(self)
        self.lock.acquire()
        time_to_quit = self.QuitFlag
        self.lock.release()
        if time_to_quit:
            self.close_request()    # That was the "wake up now" request
            raise TimeToQuit        # Get out now
        # If we reach this point, it was a real request coming in
        return request
.
.
.

This code would then become the get_request method for the
ResponsiveThreadingTCPServer class instead of the method in my earlier
post. Of course, one would also eliminate that part of the __init__()
method that dealt with the length of the timeout. The way one would shut
this server down would be only slightly more complicated. Instead of:

.
.
.
        # Tell the server it's time to shut down
        self.server.lock.acquire()
        self.server.QuitFlag = 1
        self.server.lock.release()
        print "Waiting for server to shut down (could take several seconds)..."
        self.thread.join()
        print "Exiting now."
.
.
.

in the Master Control Thread code, one would have:

.
.
.
        # Tell the server it's time to shut down
        self.server.lock.acquire()
        self.server.QuitFlag = 1
        self.server.lock.release()
        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        sock.connect(('', self.port)
        sock.close()
        print "Waiting for server to shut down (should be almost instant)..."
        self.thread.join()
        print "Exiting now."
.
.
.

This also works when select() is not available. Its main disadvantage
from my point of view was that it's not as straightforward to use, since
if you forget to make a connection to the serving thread after you set
its QuitFlag, it will sit there blocking on accept() until the next time
someone tries to access it, and that (real) connection attempt will be
accepted but instantly closed, which is almost certainly Not What You
Want.

So there are two alternate ways of doing this. One is to loop on
select() with a timeout, occasionally polling a quit flag; the other is
to always block on accept() with no timeout, checking the quit flag
every time a connection wakes up the server, and for the Master Control
Thread to wake up the server when it's time to quit by making a
connection to it.

I hope that's clear enough. If it's not, I'll be more than happy to try
to explain it. Any opinions on the relative merits of these two
approaches? I tend to lean towards the first, since it's less prone to
error in how it's used (the second can hang for a long time if you
forget to wake up the server by making that one last connection). On the
other hand, the second approach has the advantage of exiting instantly
once it's time to quit.

Although now that I think about it, how about a hybrid approach that
combines the best features of both? Put a longer timeout (say 60
seconds) on the select, so it doesn't have to poll very often, but have
the Master Control Thread make a connection to the server in order to
wake it up once the QuitFlag has been set, so it will exit instantly.

Comments? Other ideas?

-- 
Robin Munn
rmunn at pobox.com