Client.agent - PartialDownloadError - cannot determine cause
I am new with Twisted so chances are that this is *not* a bug. If that is the case, I would very much appreciate it if you help me understand what I am doing incorrectly. I have some very simple code to just download a particular page via HTTP. The problem I'm running into is that I get PartialDownloadErrors for some sites, while others load fine. And when I examine the HTTP response, it looks like the page fully downloaded. The traceback and logging information have been very unhelpful. I cannot determine what exactly is causing the error. import sys from twisted.internet.task import react from twisted.python.log import err, startLogging from twisted.web.client import Agent, BrowserLikeRedirectAgent, readBody from twisted.web.http_headers import Headers from twisted.internet import reactor from twisted.internet.ssl import ClientContextFactory def cbBody(r): print "Response body:" print r def cbRequest(response): print "Received response" d = readBody(response) d.addCallbacks(cbBody, err) return d def err(e): try: e.raiseException() except Exception, err: pass # display HTTP response # print err.response e.printTraceback() sys.stderr.write(str(e)) def main(reactor): startLogging(sys.stdout) agent = BrowserLikeRedirectAgent(Agent(reactor)) d = agent.request("GET", b"http://www.google.com", Headers({'User-Agent': ['Twisted Web Client Example']}), None) d.addCallbacks(cbRequest, err) return d react(main) 2015-04-13 14:13:11-0400 [-] Log opened. 2015-04-13 14:13:11-0400 [-] Starting factory <twisted.web.client._HTTP11ClientFactory instance at 0x10303f560> 2015-04-13 14:13:11-0400 [HTTP11ClientProtocol,client] Received response 2015-04-13 14:13:11-0400 [HTTP11ClientProtocol,client] Traceback (most recent call last): 2015-04-13 14:13:11-0400 [HTTP11ClientProtocol,client] Failure: twisted.web.client.PartialDownloadError: 200 OK 2015-04-13 14:13:11-0400 [HTTP11ClientProtocol,client] [Failure instance: Traceback (failure with no frames): <class 'twisted.web.client.PartialDownloadError'>: 200 OK 2015-04-13 14:13:11-0400 [HTTP11ClientProtocol,client] Stopping factory <twisted.web.client._HTTP11ClientFactory instance at 0x10303f560> 2015-04-13 14:13:11-0400 [-] Main loop terminated. Thanks! Chris
Am Montag, 13. April 2015, 14:29:01 schrieb Chris Drane:
I am new with Twisted so chances are that this is *not* a bug. If that is the case, I would very much appreciate it if you help me understand what I am doing incorrectly.
just googled the error message out of curiousity: http://stackoverflow.com/questions/29423986/twisted-giving-twisted-web-clien... -- Wolfgang
Yes I actually started that thread, and I wasn't fully satisfied with the answer. I feel like I was given workarounds rather than addressing what seems to be a problem with how either HTTP or TCP is handled. Please correct me if I'm mistaken (I probably am). On Mon, Apr 13, 2015 at 2:39 PM, Wolfgang Rohdewald < wolfgang.kde@rohdewald.de> wrote:
Am Montag, 13. April 2015, 14:29:01 schrieb Chris Drane:
I am new with Twisted so chances are that this is *not* a bug. If that is the case, I would very much appreciate it if you help me understand what I am doing incorrectly.
just googled the error message out of curiousity:
http://stackoverflow.com/questions/29423986/twisted-giving-twisted-web-clien...
-- Wolfgang
I revisited the Stack Overflow post and it appears that I am able to receive HTTP responses properly now. Doing so required me to launch WireShark and copy a browser's actual headers. I also had to add a ContentDecoderAgent. Specifically it was the lack of an Accept-Encoding that was causing the problem. I added ["gzip, deflate, sdch"] and it seemed to do the trick. I really do think that there should be an easier way to do this. I also don't understand why the Agent couldn't have properly interpreted the initial response. Thanks for everyone's time. On Mon, Apr 13, 2015 at 2:45 PM, Chris Drane <csdrane@gmail.com> wrote:
Yes I actually started that thread, and I wasn't fully satisfied with the answer. I feel like I was given workarounds rather than addressing what seems to be a problem with how either HTTP or TCP is handled. Please correct me if I'm mistaken (I probably am).
On Mon, Apr 13, 2015 at 2:39 PM, Wolfgang Rohdewald < wolfgang.kde@rohdewald.de> wrote:
Am Montag, 13. April 2015, 14:29:01 schrieb Chris Drane:
I am new with Twisted so chances are that this is *not* a bug. If that is the case, I would very much appreciate it if you help me understand what I am doing incorrectly.
just googled the error message out of curiousity:
http://stackoverflow.com/questions/29423986/twisted-giving-twisted-web-clien...
-- Wolfgang
On Apr 13, 2015, at 15:30, Chris Drane <csdrane@gmail.com> wrote:
I revisited the Stack Overflow post and it appears that I am able to receive HTTP responses properly now. Doing so required me to launch WireShark and copy a browser's actual headers. I also had to add a ContentDecoderAgent.
This is a good idea, but it's also a bit of an accident. Some sites may give you length prefixes with a content decoder, but there will still be some that provoke a PartialDownloadError no matter what you do. There are edge cases in HTTP where you just cannot know if you have received everything the server sent, and many sites still operate that way.
Specifically it was the lack of an Accept-Encoding that was causing the problem. I added ["gzip, deflate, sdch"] and it seemed to do the trick.
Hrm. You have to set this header manually, even using a ContentDecoderAgent? That sounds like a bug.
I really do think that there should be an easier way to do this.
Absolutely. From the very beginning, Agent was not really supposed to be a "high level" HTTP API, but those working on it sort of ran out of energy halfway through. If you're writing applications today, you probably should use treq instead - https://github.com/twisted/treq <https://github.com/twisted/treq> - but longer-term the plan is to absorb treq or something very much like it into Twisted itself. Looking through the ticket tracker, though, I see that the plan ... does not seem to be very well documented. I can't find the "high level" ticket anywhere, and the closest thing I can find with just a few minutes of searching the tracker is this: https://twistedmatrix.com/trac/ticket/3987#comment:29 <https://twistedmatrix.com/trac/ticket/3987#comment:29> which closes a ticket about a "high level interface" by talking about a "mid-level API". We clearly need a higher level API within Twisted itself.
I also don't understand why the Agent couldn't have properly interpreted the initial response.
It did properly interpret the initial response. You may or may not have received the whole body, and that's exactly what PartialDownloadError means. If responses which might be the whole response OR might be the server breaking the connection on you are acceptable, handle that error and just treat the body that you have received so far as complete. This is perfectly acceptable in many cases.
Thanks for everyone's time.
Thanks for using Twisted! Sorry that this experience was somewhat rocky. I hope you'll stick around and help us improve it. -glyph
participants (3)
-
Chris Drane
-
Glyph
-
Wolfgang Rohdewald