tstate invalid crash with threads

Ray Loyzaga ray at commsecure.com.au
Tue Jun 8 17:17:56 CEST 1999


I have been playing with a multithreaded tcp based server for a while,
and during some serious stress testing continually hit a problem
involving the interpreter crashing with "Fatal Python error:
PyThreadState_Delete: invalid tstate".

I have stripped the server down to a pretty basic SocketServer
based process which replies to a client request. It still shows the
bug, and also crashes with a Segmentation fault sometimes.
I have tested the same code on Linux 2.0.36/Python1.5.1, Linux 2.0.36/
Python1.5.2 and Solaris 2.6/python 1.5.2, all crash the same way.
Unfortunately to produce the crash I need to run 12 clients on various
other machines via 100Mbit Ethernet, the server usually dies between
150k
and 1.4M transactions later (it could take an hour).
Sometimes it crashes within 15 minutes.

It appears to be a subtle race in PyThreadState_Delete ....
interestingly, if I uncomment the small sleep in "handle" in the server,
ie.
make the server slower, it seems to work for ever ... 4m transactions
before I gave up. I think the problem only comes if you are creating and
destroying
threads quickly in parallel.

Any comments would be appreciated, note however that the following is
seriously stripped down, it isn't supposed to do anything useful.
The queue is used to make sure that I'm not keeling over due
to thread exhaustion, it isn't required, I just wanted to know if
the process goes thread crazy prior to crashing ... it doesn't.
All this was done on an otherwise idle system with 384Mb memory,
so resource exhaustion seems unlikely. I've seen this bug before
but that particular application didn't really require threads
so I converted to a synchronous server by using an ordinary
TCPServer.

The server is:
from SocketServer import ThreadingTCPServer, StreamRequestHandler
from socket import *
import Queue
import cPickle
import time
from threading import *

server_address = ('', 8000)

cnt = 0L

threadq = Queue.Queue(200)
for i in range(200):
        threadq.put(i)

class BankHandler(StreamRequestHandler):
        def handle(self):
                global cnt

                val = self.connection.recv(1024)
                if (cnt % 10000) == 0:
                        print time.ctime(time.time()), cnt
                cnt = cnt + 1
                #time.sleep(0.01)
                self.connection.send('OK')

class MyServer(ThreadingTCPServer):
        def process_request(self, request, client_address):
                """Start a new thread to process the request."""
                import thread

                if threadq.empty():
                        self.handle_error(request, client_address)
                else:
                        thread.start_new_thread(self.finish_request,
                                        (request, client_address))

        def finish_request(self, request, client_address):
                """Finish one request by instantiating
RequestHandlerClass."""
                t = threadq.get()
                self.RequestHandlerClass(request, client_address, self)
                threadq.put(t)


        def server_bind(self):
                """Called by constructor to bind the socket.
                May be overridden.
                """
                self.socket.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
                self.socket.bind(self.server_address)

s = MyServer(server_address, BankHandler)

s.serve_forever()

*********************************************************************************************
And the client is: (client_speed has a single line with .01 in it)

from socket import *
import string
import cPickle
import sys
import time
from select import select

TIMEOUT=30
trans = 0L
while 1:
        try:
                f = open('client_speed')
                delay = string.atof(f.readline())
                f.close()
                sok = socket(AF_INET, SOCK_STREAM)
                sok.connect('bozo', 8000)
                sok.send('Hello')
                x = select([sok.fileno()], [], [], TIMEOUT)
                if x != ([],[],[]):
                        data = sok.recv(2048)
                        if (trans % 10240) == 0:
                                print time.ctime(time.time()), trans,
data
                else:
                        print 'Recv timeout'
                sok.close()
        except:
                print 'failure %s'%(sys.exc_info()[1])
        trans = trans + 1
        time.sleep(0.01)




More information about the Python-list mailing list