[Python-bugs-list] [ python-Bugs-405939 ] HTTPConnection Host hdr wrong w/ proxy
noreply@sourceforge.net
noreply@sourceforge.net
Fri, 08 Mar 2002 11:39:23 -0800
Bugs item #405939, was opened at 2001-03-05 05:44
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=405939&group_id=5470
Category: Python Library
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 6
Submitted By: Ernie Sasaki (esasaki)
Assigned to: Jeremy Hylton (jhylton)
Summary: HTTPConnection Host hdr wrong w/ proxy
Initial Comment:
The HTTPConnection class' putrequest() method is
incorrect if self._http_vsn == 11 and a proxy is in
use.
Currently the following is done in httplib.py revision
1.33:
if self.port == HTTP_PORT:
self.putheader('Host', self.host)
else:
self.putheader('Host', "%s:%s" % (self.host,
self.port))
However if a proxy is in use, self.host is the proxy
address, and url contains the "realhost" which should
be in the Host header. (urllib does the right thing
here but it uses the HTTP class and not
HTTPConnection. It doesn't see this problem because
then HTTP/1.0 is used and no Host header is sent
automatically.)
Instead the following is correct:
match = httpRE.search(url)
if match:
self.putheader('Host', match.group(1))
else:
if self.port == HTTP_PORT:
self.putheader('Host', self.host)
else:
self.putheader('Host', "%s:%s" % (self.host,
self.port))
where:
httpRE = re.compile(r'(?i)http://([^/]+)')
----------------------------------------------------------------------
>Comment By: Jeremy Hylton (jhylton)
Date: 2002-03-08 19:39
Message:
Logged In: YES
user_id=31392
Fixed in rev 1.45 of httplib.py.
----------------------------------------------------------------------
Comment By: Greg Stein (gstein)
Date: 2001-08-18 10:22
Message:
Logged In: YES
user_id=6501
This looks good. Note that RFC 2616 says the Host header
should reflect the host of the "original URL" (which I
presume means without any proxy consideration).
I plan to optimize the provided code a bit, but will
otherwise follow that pattern. Look for the result in 2.2a3
(will probably miss 2.2a2).
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2001-03-05 23:44
Message:
Logged In: YES
user_id=21627
1 and 2 are probably good reasons why httplib should follow.
3 is no option, since RFC 2616 states that a client MUST
send a Host header in a 1.1 request, even though a server
MUST ignore it if an absolute URI is present; otherwise I'd
agree that it would be best not to send any at all.
As for 4, I agree that the proxy can change the Request-URI.
However, to conform to http 1.1, I think it also needs to
update the Host: header accordingly.
I'd like to get some report on actual problems caused by
this bug. If so, what is the specific proxy software being
used, etc.
On user/password issue: So far, httplib does not deal with
the request URI at all. The issue is how to process an
absoluteURI that contains a userinfo. It would be clearly
wrong to copy the userinfo into the Host: header, as your
code would do. What is not clear to me is whether RFC 2616
allows userinfo to be present in a Request-URI.
----------------------------------------------------------------------
Comment By: Ernie Sasaki (esasaki)
Date: 2001-03-05 20:31
Message:
Logged In: YES
user_id=139439
Well, my not very good answers are (notwithstanding your
quote):
1). This is what Netscape 4.7 does.
2). This is what urllib's open_http does.
3). I rather you didn't send a Host header at all rather
than a wrong one. It just makes no sense to me to give the
origin server a Host header that relates to the proxy's
address. How would the virtual host mechanism (mentioned in
the section you quote) ever work thru a proxy then?? You
need the concept of a host different from what is specified
in the Request-URI.
4). I speculate (with only secondhand evidence) that a
proxy can change the absoluteURI to an absolute path when
passing it on to the origin server. In that case, the Host
header would indeed determine the host.
As far as the patch being incomplete: In no part of httplib
does any special handling of an embedded user/password
appear. It is assumed that you'll take care of sending the
Authorization header yourself.
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2001-03-05 08:39
Message:
Logged In: YES
user_id=21627
Why is that a bug? RFC 2616, section 5.2, states
# If Request-URI is an absoluteURI, the host is part of the
# Request-URI. Any Host header field value in the request
# MUST be ignored.
So in the presence of an absolute URI, the Host: field does
not matter. It is certainly nicer to fill in the right Host:
field, but I'd like to understand the problem before
applying a fix. Your patch is incomplete, IMO: it does not
deal with the user/password part in the URL.
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=405939&group_id=5470