[ python-Bugs-680577 ] urllib2 authentication problem

SourceForge.net noreply at sourceforge.net
Sat Apr 15 20:45:29 CEST 2006


Bugs item #680577, was opened at 2003-02-05 00:22
Message generated for change (Comment added) made by jjlee
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=680577&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: GaryD (gazzadee)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib2 authentication problem

Initial Comment:
I've found a problem using the authentication in urllib2.

When matching up host-names in order to find a
password, then putting the protocol in the address
makes it seem like a different address. eg...

I create a HTTPBasicAuthHandler with a
HTTPPasswordMgrWithDefaultRealm, and add the tuple
(None, "http://proxy.blah.com:17828", "foo", "bar") to it.

I then setup the proxy to use
http://proxy.blah.com:17828 (which requires
authentication).

When I connect, the password lookup fails, because it
is trying to find a match for "proxy.blah.com:17828"
rather than "http://proxy.blah.com:17828"

This problem doesn't exist if I pass
"proxy.blah.com:17828" to the password manager.

There seems to be some stuff in HTTPPasswordMgr to deal
with variations on site names, but I guess it's not
working in this case (unless this is intentional).

Version Info:
Python 2.2 (#1, Feb 24 2002, 16:21:58)
[GCC 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)]
on linux-i386


----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2006-04-15 19:45

Message:
Logged In: YES 
user_id=261020

This issue is fixed by patch 1470846.


----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2003-12-16 12:49

Message:
Logged In: YES 
user_id=261020

Thanks! 
 
It seems .reduce_uri() tries to cope with hostnames as well as 
absoluteURIs.  I don't understand why it wants to do that, but it 
fails, because it doesn't anticipate what urlparse does when a 
port is present: 
 
>>> urlparse.urlparse("foo.bar.com") 
('', '', 'foo.bar.com', '', '', '') 
>>> urlparse.urlparse("foo.bar.com:80") 
('foo.bar.com', '', '80', '', '', '') 
 
I haven't checked, but I assume it's just incorrect use of 
urlparse to pass it a hostname. 
 
Of course, if it's "fixed" to only accept absoluteURIs, it will 
break existing code, so I guess it must be fixed for 
hostnames. :-(( 
 
Also, I think .is_suburi("/foo/spam", "/foo/eggs") should return 
False, but returns True, and .http_error_40x() use 
req.get_host() when they should be using req.get_full_url() 
(from a quick look at RFC 2617). 

----------------------------------------------------------------------

Comment By: GaryD (gazzadee)
Date: 2003-12-16 03:10

Message:
Logged In: YES 
user_id=693152

Okay, I have attached a file that replicates this problem.

If you run it as is (replacing the proxy name and address
with something suitable), then it will fail (requiring proxy
authentication).

If you uncomment line 23 (which specifies the password
without the scheme), then it will work successfully.

Technical Info:
 * For a proxy, I am using Squid Cache version 2.4.STABLE7
for i586-mandrake-linux-gnu...
 * I have replicated the problem with Python 2.2.2 on Linux,
and Python 2.3.2 on Windows XP.

----------------------------------------------------------------------

Comment By: GaryD (gazzadee)
Date: 2003-12-16 02:08

Message:
Logged In: YES 
user_id=693152

This was a while ago, and my memory has faded. I'll try to
respond intelligently.

I think the question was with the way the password manager
looks up passwords, rather than anything else.

I am pretty sure that the problem is not to do with the URI
passed to urlopen(). In the code shown below, the problem
was solely dependent on whether I added the line:
    (None, "blah.com:17828", "foo", "bar")
...to the HTTPPasswordMgrWithDefaultRealm object.

If that password set was added, then the password lookup for
the proxy was successful, and urlopen() worked. If that
password set was not included, then the password lookup for
the proxy was unsuccessful (despite the inclusion of the
other 2, similar, password sets - "http://blah.com:17828"
and "blah.com"), and urlopen() would fail. Hence my
suspicion that the password manager did not fully remove the
scheme, despite attempts to do so.

I'll see if I can set it up on the latest python and get it
to happen again.

Just as an explanation, the situation was that I was running
an authenticating proxy on a non-standard port (in order to
avoid clashing with the normal proxy), in order to test out
how my download code would work through an authenticating proxy.

----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2003-12-01 00:14

Message:
Logged In: YES 
user_id=261020

The problem seems to be with the port (:17828), not the URL 
scheme (http:), because HTTPPasswordMgr.reduce_uri() 
removes the scheme. 
 
RFC 2617 (top of page 3) says nothing about removing the 
port from the URI.  urllib2 does not remove the port, so this 
doesn't appear to be a bug. 
 
I guess gazzadee was doing a urlopen with a different 
canonical root URI (RFC 2617, top of page 3 again) to the one 
he gave in add_password (ie. the URL he passed to urlopen() 
had no explicit port number). 

----------------------------------------------------------------------

Comment By: GaryD (gazzadee)
Date: 2003-02-09 23:17

Message:
Logged In: YES 
user_id=693152

Okay, the same problem crops up in Python 2.2.2 running
under cygwin on Win XP

Version Info:
Python 2.2.2 (#1, Dec 31 2002, 12:24:34) 
[GCC 3.2 20020927 (prerelease)] on cygwin

Here's the pertinent section of my test file (passwords and
URL changed to protect the innocent):


    # Setup proxy
    proxy_handler = ProxyHandler({"http" :
"http://blah.com:17828"})
    
    # Setup authentication
    pass_mgr = HTTPPasswordMgrWithDefaultRealm()
    for passwd in [ \
                   (None, "http://blah.com:17828", "foo",
"bar"), \
#                   (None, "blah.com:17828", "foo",
"bar"), \	# Works if this line is uncommented
                   (None, "blah.com", "foo", "bar"), \
                  ]:
        print("Adding password set (%s, %s, %s, %s)" % passwd)
        pass_mgr.add_password(*passwd)
    auth_handler = HTTPBasicAuthHandler(pass_mgr)
    proxy_auth_handler = ProxyBasicAuthHandler(pass_mgr)
    
    # Now build a new URL opener and install it
    opener = build_opener(proxy_handler, proxy_auth_handler,
auth_handler, HTTPHandler)
    install_opener(opener)
    
    # Now try to open a file and see what happens
    request = Request("http://www.google.com")
    try:
        remotefile = urlopen(request)
    except HTTPError, ex:
        print("Unable to download file due to HTTP Error %d
(%s)." % (ex.code, ex.msg))
        return



----------------------------------------------------------------------

Comment By: Gerhard Häring (ghaering)
Date: 2003-02-07 23:21

Message:
Logged In: YES 
user_id=163326

Can you please retry with Python 2.2.2?

It seems that a related bug was fixed for 2.2.2:
http://python.org/2.2.2/NEWS.txt has an entry:

"""
- In urllib2.py: fix proxy config with user+pass
authentication.  [SF
  patch 527518]
"""

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=680577&group_id=5470


More information about the Python-bugs-list mailing list