problem using urllib2: \n

bmiras at yahoo.com bmiras at yahoo.com
Tue Sep 23 12:00:05 EDT 2003


I've got a problem using urllib2 to get a web page.
I'm going through a proxy using user/password authentification
and i'm trying to get a page asking for a HTTP authentification.
And I'm using python 2.3

Here is an exemple of the piece of code I use:

import urllib2
#Proxy handler
proxy_handler = urllib2.ProxyHandler({"http" :
"http://proxyuser:proxypassword@myproxy:8050"})

#Site auth handler

site_auth_handler = urllib2.HTTPBasicAuthHandler();
site_auth_handler.add_password( "This Realm", "www.mysite.com",
"siteuser", "sitepassword" );


opener = urllib2.build_opener( site_auth_handler,
urllib2.HTTPRedirectHandler, urllib2.HTTPHandler , proxy_handler)
urllib2.install_opener(opener)


req = urllib2.Request('http://www.mysite.com/protectedpage')
page = urllib2.urlopen(req)

I got a 401 error.

Analyzing the request using 'strace' I can see the following request
sent to the proxy:

GET http://www.mysite.com/protectedpage HTTP/1.0\r\nHost:
www.mysite.com\r\nUser-agent:
Python-urllib/2.0a1\r\nProxy-authorization: Basic
bWlyYXNiOm1pcjAz\n\r\nAuthorization: Basic
bWlyYXM6bWlyYXMwMDE=\n\r\n\r\n

As you can see there is additionnal \n sent to the server just after
the Proxy-authorization and the Authorization fields. I think that in
this case the web server get only this part:
GET http://www.mysite.com/protectedpage HTTP/1.0\r\nHost:
www.mysite.com\r\nUser-agent:
Python-urllib/2.0a1\r\nProxy-authorization: Basic
bWlyYXNiOm1pcjAz\n\r\n

and so send me back an error 401, since I'm not authenticated for the
site.

I had a look in the urllib2.py . I think that base64.encodestring add
an \n at the end of the string. It's the case in the method
'proxy_open':

    def proxy_open(self, req, proxy, type):
        orig_type = req.get_type()
        type, r_type = splittype(proxy)
        host, XXX = splithost(r_type)
        if '@' in host:
            user_pass, host = host.split('@', 1)
            if ':' in user_pass:
                user, password = user_pass.split(':', 1)
                user_pass = base64.encodestring('%s:%s' %
(unquote(user),
                                                          
unquote(password)))
                req.add_header('Proxy-authorization', 'Basic ' +
user_pass)
        host = unquote(host)
        req.set_proxy(host, type)
   ...

I think it should be:

user_pass = base64.encodestring('%s:%s' % (unquote(user),
                                           unquote(password))).split()

have you any other clue?
thank you!

Bastien




More information about the Python-list mailing list