[Python-Dev] Issue with HTTP basic proxy authentication in urllib2

Tue Jan 20 22:48:37 EST 2004

I have only been scripting in python for about 3 months
and have mainly been performing HTTP requests to cgi scripts.
I have had had a lot of trouble with POST method and believe I have
tracked this down to the fact that I'm having to perform Basic Proxy
authentication.

When the HTTP header for Basic authentication is created in urllib2
the username and password are encoded using the encodestring() method
of the base64 module.  This adds a newline '\n' to the end of the returned
encoded string (as documented).

from urllib2.py

    def proxy_open(self, req, proxy, type):
        orig_type = req.get_type()
        type, r_type = splittype(proxy)
        host, XXX = splithost(r_type)
        if '@' in host:
            user_pass, host = host.split('@', 1)
            if ':' in user_pass:
                user, password = user_pass.split(':', 1)
                user_pass = base64.encodestring('%s:%s' % 
(unquote(user),Harry_Connick_Jr - She.albm

unquote(password)))

                req.add_header('Proxy-authorization', 'Basic ' + user_pass)

This newline then becomes the end of the HTTP Proxy authentication header:

        Proxy-authorization: Basic asdfSADFwaer%asfdas=\n

When the HTTP request is assembled the required CRLF is added to the end
of each header along with the one extra to indicate the end of the 
request headers.

This would seem OK except some HTTP servers appear to interperet the newline
after the proxy authentication header as an extra blank line in the 
request,  thus
missing any data sent through in the body of the request.  This is not a 
problem
with the GET method (obviously) or a POST method which is not sent though
an HTTP Proxy server requiring authentication.

I made the very simple addition to urllib2.py of

        user_pass=user_pass.rstrip()

to remove the trailing '\n' from the encoded username/password before it 
was
added to the HTTP headers and everything works without a problem.

I'm not sure if this is an issue which has been raised before.  I 
glanced through the
summaries of the python-dev mailing list for the past year and found no 
mention of it.
Technically I don't see it as a bug in urllib2.py but an incorrect 
implementation of HTTP
header parsing by servers (The Proxy server may even be the culprit)

Cheers

PS.  If there is any need to discuss this could I please be CC'd as I am 
not a member of the mailing, Thanks
-- 
Ivano Broz
Metabolic Research Unit,
Pigdons Road,
Waurn Ponds 3217,
Victoria Australia
Ph: 61 3 52272195
"the money I save won't buy my youth again"