Re: [Twisted-Python] Problem fetching page with getPage
"Steve" == ssteinerX@gmail com
writes: Steve> On Jan 2, 2010, at 9:34 AM, Terry Jones wrote: In any case, it looks like the problem is not in the setup of the request. Can anyone offer a reason why httplib might be able to fetch the page whereas getPage receives an error? I'm stumped. Steve> Steve> I've had to debug things like this recently and I have two suggestions:
Hi Steve Thanks for the helpful reply - I can now make the call successfully. The difference turned out to be that httplib puts a Host: hostname:port header into its calls, whereas getPage uses just Host: hostname. Plus there was something else going on in some other code I'm using that made this a problem (it was calculating a signature based on host:port). Steve> 1> Recreate the headers and make it work with curl. Curl won't add Steve> anything to your headers and such and you'll be sure that you're Steve> getting the result you want with completely stripped down case. At least on my machine (curl 7.18.0 on Linux Ubuntu/Hardy) it adds a User-agent, an Accept: */*, and also the Host header. Steve> 2> Get Charles http://www.charlesproxy.com/ if you're on OS X. It Steve> rocks. Otherwise, get one of the Windows tools (sorry, no recos Steve> from me on that), and watch exactly what goes by. It's available for Linux & Windows too. I tried it, but didn't make it work fully when sending requests from the command line (with SSL, spoofing DNS, etc). So in the end I just used netcat -l -p 443 and changed to HTTP to see what was being sent. I wouldn't have thought of doing that without your suggestion, so thanks a lot for the tip. Terry
On Jan 2, 2010, at 4:14 PM, Terry Jones wrote:
Thanks for the helpful reply - I can now make the call successfully. The difference turned out to be that httplib puts a Host: hostname:port header into its calls, whereas getPage uses just Host: hostname. Plus there was something else going on in some other code I'm using that made this a problem (it was calculating a signature based on host:port).
I'm glad that you tracked this down! According to comments on http://twistedmatrix.com/trac/ticket/886, this problem was addressed in the new HTTP client implementation. Have you considered using the new twisted.web.client.Agent instead of getPage?
participants (2)
-
Glyph Lefkowitz
-
Terry Jones