HTTPLIB - Problem retrieving a page
export at sendme.cz
Sat Nov 24 17:48:52 CET 2001
Oleg Broytmann <phd at phd.pp.ru> wrote in message news:<mailman.1006594710.8985.python-list at python.org>...
> On Fri, Nov 23, 2001 at 07:46:46PM +0000, Colin Meeks wrote:
> > I am running the following code to retrieve pages and strip out some
> > details,
> > however I have noticed that some sites do not work, even though the correct
> > URL is given. I can verify it works by testing it in my browser. The below
> > code gives a 404 error.
> > import urlparse, httplib, urllib
> > UseURL='http://www.meeks.ca/index.htm'
> I tested the site a bit and found that it responds Error404 to HTTP/0.9
> requests (I tested it with netcat). HTTP/1.0 and HTTP/1.1 requests are ok.
> So the reason is (or at least may be) that your version of Python library
> does not send HTTP version with request. Try to verify this (that is, look
> into headers that httplib sends; use debugging proxy or fake http server
> for testing - there are a number of pure python tools for this task).
You must use HTTP/1.1
Here is a working sample
(scheme, server, path, param,query,fragment) = urlparse.urlparse(UseURL)
h = httplib.HTTPConnection(server)#start HTTP/1.1
r = h.getresponse()
data = r.read() # Get the raw HTML
More information about the Python-list