HTTPLIB - Problem retrieving a page

Sat Nov 24 11:48:52 EST 2001

Oleg Broytmann <phd at phd.pp.ru> wrote in message news:<mailman.1006594710.8985.python-list at python.org>...
> On Fri, Nov 23, 2001 at 07:46:46PM +0000, Colin Meeks wrote:
> > I am running the following code to retrieve pages and strip out some
> > details,
> > however I have noticed that some sites do not work, even though the correct
> > URL is given.  I can verify it works by testing it in my browser. The below
> > code gives a 404 error.
> > 
> > import urlparse, httplib, urllib
> > UseURL='http://www.meeks.ca/index.htm'
> 
>    I tested the site a bit and found that it responds Error404 to HTTP/0.9
> requests (I tested it with netcat). HTTP/1.0 and HTTP/1.1 requests are ok.
>    So the reason is (or at least may be) that your version of Python library
> does not send HTTP version with request. Try to verify this (that is, look
> into headers that httplib sends; use debugging proxy or fake http server
> for testing - there are a number of pure python tools for this task).
> 
> Oleg.

You must use HTTP/1.1 

Here is a working sample
import httplib,urlparse
UseURL='http://www.meeks.ca/index.htm'
(scheme, server, path, param,query,fragment) = urlparse.urlparse(UseURL)
h = httplib.HTTPConnection(server)#start HTTP/1.1 

h.putrequest('GET', path)
h.putheader('Accept', 'text/html')
h.putheader('Accept', 'text/plain')
h.endheaders()
r = h.getresponse()

data = r.read() # Get the raw HTML
print data

Ladislav