[Patches] [ python-Patches-1185444 ] urllib2 dloads failing through HTTP proxy w/ auth
SourceForge.net
noreply at sourceforge.net
Wed May 4 03:16:00 CEST 2005
Patches item #1185444, was opened at 2005-04-18 14:07
Message generated for change (Comment added) made by jwpye
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1185444&group_id=5470
Category: Library (Lib)
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Mike Fleetwood (mfleetwo)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib2 dloads failing through HTTP proxy w/ auth
Initial Comment:
When using urllib2 to download through a HTTP proxy,
which requires
authorisation, a broken HTTP request is sent. The initial
request might
work but subsequent requests send using the same socket
definitely
fail.
Problem occurs on Fedora Core 3 with python 2.3.4. Buggy
code still
exists in Python Library in 2.4.1.
Found the problem using yum to download files via my
companies Microsoft
ISA web proxy. The proxy requires authorisation. I set the
HTTP_PROXY
environment variable to define the proxy like this:
export HTTP_PROXY=http://username:password@proxy.
example.com:8080/
Analysis from my yum bugzilla report
http://devel.linux.duke.edu/bugzilla/show_bug.cgi?id=441 ,
follows:
Location is:
File: urllib2.py
Class: ProxyHandler
Function: proxy_open()
The basic proxy authorisation string is created using
base64.encodestring() and passed to add_header() method
of a Request
object. However base64.encodestring() specifically adds a
trailing
'\n' but when the headers are sent over the socket each is
followed by
'\r\n'. The server sees this double new line as the end of the
HTTP
request and the rest of the HTTP headers as a second
invalid request.
The broken request looks like this:
GET ...
Host: ...
Accept-Encoding: identity
Proxy-authorization: Basic xxxxxxxxxxxxxxxx
<-- Blank line which shouldn't be there
User-agent: urlgrabber/2.9.2
<-- Blank line ending HTTP request
The fix is just to remove the '\n' which base64.encodestring()
added
before calling add_header(). Just use string method strip()
as is done
in the only other location base64.encodestring() is used in
file
urllib2.py.
----------------------------------------------------------------------
Comment By: James William Pye (jwpye)
Date: 2005-05-03 18:15
Message:
Logged In: YES
user_id=1044177
Seems like a valid issue to me.
Each header in HTTP must be followed with a CRLF, not a LFCRLF:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.2
http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4
And I don't think it consistutes continuation of the
field-content either as the LF is not followed by at least 1
SP or HT, rather a CRLF(per LWS).
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1185444&group_id=5470
More information about the Patches
mailing list