[Python-bugs-list] [ python-Bugs-405939 ] HTTPConnection Host hdr wrong w/ proxy

noreply@sourceforge.net noreply@sourceforge.net
Fri, 16 Mar 2001 09:56:40 -0800


Bugs item #405939, was updated on 2001-03-04 21:44
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=405939&group_id=5470

Category: Python Library
Group: None
Status: Open
Priority: 5
Submitted By: Ernie Sasaki (esasaki)
>Assigned to: Greg Stein (gstein)
Summary: HTTPConnection Host hdr wrong w/ proxy

Initial Comment:
The HTTPConnection class' putrequest() method is 
incorrect if self._http_vsn == 11 and a proxy is in 
use.

Currently the following is done in httplib.py revision 
1.33:

if self.port == HTTP_PORT:
    self.putheader('Host', self.host)
else:
    self.putheader('Host', "%s:%s" % (self.host, 
self.port))

However if a proxy is in use, self.host is the proxy 
address, and url contains the "realhost" which should 
be in the Host header. (urllib does the right thing 
here but it uses the HTTP class and not 
HTTPConnection. It doesn't see this problem because 
then HTTP/1.0 is used and no Host header is sent 
automatically.)

Instead the following is correct:

match = httpRE.search(url)
if match:
    self.putheader('Host', match.group(1))
else:
    if self.port == HTTP_PORT:
        self.putheader('Host', self.host)
    else:
	self.putheader('Host', "%s:%s" % (self.host, 
self.port))

where:

httpRE = re.compile(r'(?i)http://([^/]+)')


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-03-05 15:44

Message:
Logged In: YES 
user_id=21627

1 and 2 are probably good reasons why httplib should follow.
3 is no option, since RFC 2616 states that a client MUST
send a Host header in a 1.1 request, even though a server
MUST ignore it if an absolute URI is present; otherwise I'd
agree that it would be best not to send any at all.

As for 4, I agree that the proxy can change the Request-URI.
However, to conform to http 1.1, I think it also needs to
update the Host: header accordingly.

I'd like to get some report on actual problems caused by
this bug. If so, what is the specific proxy software being
used, etc.

On user/password issue: So far, httplib does not deal with
the request URI at all. The issue is how to process an
absoluteURI that contains a userinfo. It would be clearly
wrong to copy the userinfo into the Host: header, as your
code would do. What is not clear to me is whether RFC 2616
allows userinfo to be present in a Request-URI.

----------------------------------------------------------------------

Comment By: Ernie Sasaki (esasaki)
Date: 2001-03-05 12:31

Message:
Logged In: YES 
user_id=139439

Well, my not very good answers are (notwithstanding your 
quote):

1). This is what Netscape 4.7 does.

2). This is what urllib's open_http does.

3). I rather you didn't send a Host header at all rather 
than a wrong one.  It just makes no sense to me to give the 
origin server a Host header that relates to the proxy's 
address. How would the virtual host mechanism (mentioned in 
the section you quote) ever work thru a proxy then?? You 
need the concept of a host different from what is specified 
in the Request-URI.

4). I speculate (with only secondhand evidence) that a 
proxy can change the absoluteURI to an absolute path when 
passing it on to the origin server. In that case, the Host 
header would indeed determine the host.

As far as the patch being incomplete: In no part of httplib 
does any special handling of an embedded user/password 
appear. It is assumed that you'll take care of sending the 
Authorization header yourself.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-03-05 00:39

Message:
Logged In: YES 
user_id=21627

Why is that a bug? RFC 2616, section 5.2, states

# If Request-URI is an absoluteURI, the host is part of the 
# Request-URI. Any Host header field value in the request 
# MUST be ignored.

So in the presence of an absolute URI, the Host: field does
not matter. It is certainly nicer to fill in the right Host:
field, but I'd like to understand the problem before
applying a fix. Your patch is incomplete, IMO: it does not
deal with the user/password part in the URL.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=405939&group_id=5470