HTTPLIB - Problem retrieving a page
phd at phd.pp.ru
Sat Nov 24 10:36:23 CET 2001
On Fri, Nov 23, 2001 at 07:46:46PM +0000, Colin Meeks wrote:
> I am running the following code to retrieve pages and strip out some
> however I have noticed that some sites do not work, even though the correct
> URL is given. I can verify it works by testing it in my browser. The below
> code gives a 404 error.
> import urlparse, httplib, urllib
I tested the site a bit and found that it responds Error404 to HTTP/0.9
requests (I tested it with netcat). HTTP/1.0 and HTTP/1.1 requests are ok.
So the reason is (or at least may be) that your version of Python library
does not send HTTP version with request. Try to verify this (that is, look
into headers that httplib sends; use debugging proxy or fake http server
for testing - there are a number of pure python tools for this task).
Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru
Programmers don't die, they just GOSUB without RETURN.
More information about the Python-list