mysterious hangs in socket code
I've been running a small, multi-threaded program to retrieve web pages today. The entire program appears to hang when I perform a slow DNS operation, even there is no application-level coordinate between the threads. The motivation comes from http://www.python.org/sf/591349, but I ended up writing a similar small test script, which I've attached. When I run this program with Python 2.1, it produces a steady stream of output -- urls and the time it took to load them. Most of the pages take less than a second, but some take a very long time. If I run this program with Python 2.2 or 2.3, it produces little bursts of output, then pauses for a long time, then repeats. I believe that the problem relates to DNS lookups, but not in a way I fully understand. If I connect gdb to any of the threads while the program is hung, it is always inside getaddrinfo(). My first realization was that the socketmodule stopped wrapping DNS lookups in By_BEGIN/END_ALLOW_THREADS calls when the IPv6 changes were integrated. But if I restore these calls -- see http://www.python.org/sf/604210 -- I don't see any change in behavior. The program still hangs periodically. One possibility is that the Linux getaddrinfo() is thread-safe, but only by way of a lock that only allows one request to be outstanding at a time. Not sure what the other possibilities are, but the current behavior is awful. Jeremy --------------------------------------------------------------------- import httplib import Queue import random import sys import threading import time import traceback import urlparse headers = {"Accept": "text/plain, text/html, image/jpeg, image/jpg, " "image/gif, image/png, */*"} class URLThread(threading.Thread): def __init__(self, queue): threading.Thread.__init__(self) self._queue = queue self._stopevent = threading.Event() def stop(self): self._stopevent.set() def run(self): while not self._stopevent.isSet(): self.fetch() def fetch(self): url = self._queue.get() t0 = time.time() try: self._fetch(url) except: etype, value, tb = sys.exc_info() L = ["Error occurred fetching %s\n" % url, "%s: %s\n" % (etype, value), ] L += traceback.format_tb(tb) sys.stderr.write("".join(L)) t1 = time.time() print url, round(t1 - t0, 2) def _fetch(self, url): parts = urlparse.urlparse(url) host = parts[1] path = parts[2] h = httplib.HTTPConnection(host) h.connect() h.request("GET", path, headers=headers) r = h.getresponse() r.read() h.close() urls = """\ http://www.andersen.com/ http://www.google.com/ http://www.google.com/images/logo.gif http://www.microsoft.com/ http://www.microsoft.com/homepage/gif/bnr-microsoft.gif http://www.microsoft.com/homepage/gif/1ptrans.gif http://www.microsoft.com/library/toolbar/images/curve.gif http://www.yahoo.com/ http://www.sourceforge.net/ http://www.slashdot.org/ http://www.kuro5hin.org/ http://www.intel.com/ http://www.aol.com/ http://www.amazon.com/ http://www.cnn.com/ http://money.cnn.com/ http://www.expedia.com/ http://www.tripod.com/ http://www.hotmail.com/ http://www.angelfire.com/ http://www.excite.com/ http://www.verisign.com/ http://www.riaa.com/ http://www.enron.com/ http://www.securityspace.com/ http://www.directv.com/ http://www.att.com/ http://www.qwest.com/ http://www.covad.com/ http://www.sprint.com/ http://www.mci.com/ http://www.worldcom.com/ """ urls = [u for u in urls.split("\n") if u] REPEAT = 10 THREADS = 8 class RandomQueue: def __init__(self, L): self.list = L def get(self): return random.choice(self.list) if __name__ == "__main__": urlq = RandomQueue(urls) sys.setcheckinterval(10) threads = [] for i in range(THREADS): t = URLThread(urlq) t.start() threads.append(t) while 1: try: time.sleep(30) except: break print "Shutting down threads..." for t in threads: t.stop() for t in threads: t.join()
participants (4)
-
Aahz
-
barry@python.org
-
Guido van Rossum
-
Jeremy Hylton