[Twisted-Python] web.client blowing up on non-fully qualified 301s
Hello, I have run into an odd problem. I am not sure if it is my issue or Twisted: any help would be appreciated. Under at least some circumstances, twisted.web.client seems to 1) not be able to follow a 301, and 2) throw an unhandled exception, when trying to follow a 301. A specific example and resulting error is given below: from twisted.internet import defer from twisted.web import client from twisted.internet import reactor class HTTPGetter(client.HTTPClientFactory): protocol = client.HTTPPageGetter class Fetcher: def __init__(self,client_factory = HTTPGetter): self.factory = client_factory def download(self,host,port,url): f = self.factory(url) f.deferred.addCallback(self.downloadFinished).addErrback(self.downloadFailed) k = reactor.connectTCP(host, port, f, timeout=10) return f.deferred def downloadFinished(self,v): print "good" def downloadFailed(self, v): print "bad" print v r = Fetcher() w = r.download("www.shopzilla.com",80,"/aaaa") reactor.callLater(10,reactor.stop) reactor.run() This results in: Unhandled error in Deferred: Traceback (most recent call last): File "/usr/local/lib/python2.4/site-packages/twisted/internet/posixbase.py", line 226, in mainLoop self.runUntilCurrent() File "/usr/local/lib/python2.4/site-packages/twisted/internet/base.py", line 541, in runUntilCurrent call.func(*call.args, **call.kw) File "/usr/local/lib/python2.4/site-packages/twisted/internet/tcp.py", line 494, in resolveAddress d.addCallbacks(self._setRealAddress, self.failIfNotConnected) File "/usr/local/lib/python2.4/site-packages/twisted/internet/defer.py", line 182, in addCallbacks self._runCallbacks() --- <exception caught here> --- File "/usr/local/lib/python2.4/site-packages/twisted/internet/defer.py", line 307, in _runCallbacks self.result = callback(self.result, *args, **kw) File "/usr/local/lib/python2.4/site-packages/twisted/internet/tcp.py", line 498, in _setRealAddress self.doConnect() File "/usr/local/lib/python2.4/site-packages/twisted/internet/tcp.py", line 520, in doConnect connectResult = self.socket.connect_ex(self.realAddress) File "<string>", line 1, in connect_ex exceptions.TypeError: an integer is required Which is apparently due to the fact that doConnect assumes a good address and so does not trap for TypeError. The bad address that doConnect blows up on ('',None) for (host,port) slips in due to twisted.web.client.handleStatus_301. The example site (Shopzilla.com) posts a URL for the 301 Location that is not fully qualified. handleStatus_301, in the face of such a URL, appears to fail because it relies on getting the host/port from the location URL, but these are not present in it. Thus it passes in the ('',None) to its reactor.connectTCP attempt to follow the redirect, leading to the error above. My kludge fix to handleStatus_301 is given below, where if the host or port are missing I steal them from the transport, which should be correct since it was just used to get the page. I am running with this now, with no errors. Is this a Twisted issue? If so, is my fix reasonable? If it is not a Twisted issue, what am I doing wrong? Thanks, Keith def handleStatus_301(self): l = self.headers.get('location') if not l: self.handleStatusDefault() url = l[0] if self.followRedirect: scheme, host, port, path = \ _parse(url, defaultPort=self.transport.getPeer().port) self.factory.setURL(url) #following 4 lines added kad to fix apparent issue with 301 to a url that is not fully qualified if self.factory.host == '': self.factory.host = self.transport.addr[0] if self.factory.port == None: self.factory.port = self.transport.addr[1]
participants (2)
-
Keith Dutton
-
Stephen Thorne