urllib2 through basic auth'ed proxy

John J. Lee jjl at pobox.com
Thu Mar 30 18:42:07 EST 2006


Alejandro Dubrovsky <dubrovsky at physics.uq.edu.au> writes:
[...]
> How does one connect through a proxy which requires basic authorisation?
> The following code, stolen from somewhere, fails with a 407:
> 
[...code involving urllib2.ProxyBasicAuthHandler()...]
> Can anyone explain me why this fails, or more importantly, code that would
> work? 

OK, I finally installed squid and had a look at the urllib2 proxy
basic auth support (which I've steered clear of for years despite
doing quite a bit with urllib2).  Seems quite broken.  Appears to have
been broken back in December 2004, with revision 38092 (note there's a
little revision number oddness in the Python SVN repo, BTW:
http://mail.python.org/pipermail/python-dev/2005-November/058269.html):

--- urllib2.py  (revision 38091)
+++ urllib2.py  (revision 38092)
@@ -720,7 +720,10 @@
                     return self.retry_http_basic_auth(host, req, realm)

     def retry_http_basic_auth(self, host, req, realm):
-        user,pw = self.passwd.find_user_password(realm, host)
+        # TODO(jhylton): Remove the host argument? It depends on whether
+        # retry_http_basic_auth() is consider part of the public API.
+        # It probably is.
+        user, pw = self.passwd.find_user_password(realm, req.get_full_url())
         if pw is not None:
             raw = "%s:%s" % (user, pw)
...


That can't be right, can it?  With a proxy, you're always
authenticating yourself for the whole proxy, and you want to look up
(RFC 2617 section 3.2.1).  The ProxyBasicAuthHandler subclass
dutifully passes in the right thing for the host argument, but
AbstractBasicAuthHandler ignores it, which means that it never finds
the password -- e.g. if you're trying to connect to python.org through
myproxy.com, it'll be looking for a username/password for python.org
instead of the needed myproxy.com.

Obviously nobody else uses authenticating proxies either, or at least
nobody who can be bothered to fix urllib2 :-(

A workaround is to supply a stupid HTTPPasswordMgr that always returns
the proxy credentials regardless of what the handler asks it for (only
tested with a perhaps-broken 2.5 install, since I've broken my 2.4
install):

import urllib2

class DumbProxyPasswordMgr:
    def __init__(self):
        self.user = self.passwd = None
    def add_password(self, realm, uri, user, passwd):
        self.user = user
        self.passwd = passwd
    def find_user_password(self, realm, authuri):
        return self.user, self.passwd
proxy_auth_handler = urllib2.ProxyBasicAuthHandler(DumbProxyPasswordMgr())
proxy_handler = urllib2.ProxyHandler({"http": "http://localhost:3128"})
proxy_auth_handler.add_password(None, None, 'john', 'blah')
opener = urllib2.build_opener(proxy_handler, proxy_auth_handler)
f = opener.open('http://python.org/')
print f.read()


Yuck, yuck, yuck!  I had realised the auth/proxies code in urllib2 was
buggy, but...  And all those hoops to jump through.

Also, if you're using 2.5 SVN HEAD, it seems revision 42133 broke
ProxyHandler in an attempt to fix the URL host:post syntax!

I'll try to get some fixes in tomorrow so that 2.5 isn't broken (or at
least flag the issues to let somebody else fix them), but no promises
as usual...


John




More information about the Python-list mailing list