[ python-Bugs-680577 ] urllib2 authentication problem
SourceForge.net
noreply at sourceforge.net
Sat Apr 15 20:45:29 CEST 2006
Bugs item #680577, was opened at 2003-02-05 00:22
Message generated for change (Comment added) made by jjlee
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=680577&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: GaryD (gazzadee)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib2 authentication problem
Initial Comment:
I've found a problem using the authentication in urllib2.
When matching up host-names in order to find a
password, then putting the protocol in the address
makes it seem like a different address. eg...
I create a HTTPBasicAuthHandler with a
HTTPPasswordMgrWithDefaultRealm, and add the tuple
(None, "http://proxy.blah.com:17828", "foo", "bar") to it.
I then setup the proxy to use
http://proxy.blah.com:17828 (which requires
authentication).
When I connect, the password lookup fails, because it
is trying to find a match for "proxy.blah.com:17828"
rather than "http://proxy.blah.com:17828"
This problem doesn't exist if I pass
"proxy.blah.com:17828" to the password manager.
There seems to be some stuff in HTTPPasswordMgr to deal
with variations on site names, but I guess it's not
working in this case (unless this is intentional).
Version Info:
Python 2.2 (#1, Feb 24 2002, 16:21:58)
[GCC 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)]
on linux-i386
----------------------------------------------------------------------
Comment By: John J Lee (jjlee)
Date: 2006-04-15 19:45
Message:
Logged In: YES
user_id=261020
This issue is fixed by patch 1470846.
----------------------------------------------------------------------
Comment By: John J Lee (jjlee)
Date: 2003-12-16 12:49
Message:
Logged In: YES
user_id=261020
Thanks!
It seems .reduce_uri() tries to cope with hostnames as well as
absoluteURIs. I don't understand why it wants to do that, but it
fails, because it doesn't anticipate what urlparse does when a
port is present:
>>> urlparse.urlparse("foo.bar.com")
('', '', 'foo.bar.com', '', '', '')
>>> urlparse.urlparse("foo.bar.com:80")
('foo.bar.com', '', '80', '', '', '')
I haven't checked, but I assume it's just incorrect use of
urlparse to pass it a hostname.
Of course, if it's "fixed" to only accept absoluteURIs, it will
break existing code, so I guess it must be fixed for
hostnames. :-((
Also, I think .is_suburi("/foo/spam", "/foo/eggs") should return
False, but returns True, and .http_error_40x() use
req.get_host() when they should be using req.get_full_url()
(from a quick look at RFC 2617).
----------------------------------------------------------------------
Comment By: GaryD (gazzadee)
Date: 2003-12-16 03:10
Message:
Logged In: YES
user_id=693152
Okay, I have attached a file that replicates this problem.
If you run it as is (replacing the proxy name and address
with something suitable), then it will fail (requiring proxy
authentication).
If you uncomment line 23 (which specifies the password
without the scheme), then it will work successfully.
Technical Info:
* For a proxy, I am using Squid Cache version 2.4.STABLE7
for i586-mandrake-linux-gnu...
* I have replicated the problem with Python 2.2.2 on Linux,
and Python 2.3.2 on Windows XP.
----------------------------------------------------------------------
Comment By: GaryD (gazzadee)
Date: 2003-12-16 02:08
Message:
Logged In: YES
user_id=693152
This was a while ago, and my memory has faded. I'll try to
respond intelligently.
I think the question was with the way the password manager
looks up passwords, rather than anything else.
I am pretty sure that the problem is not to do with the URI
passed to urlopen(). In the code shown below, the problem
was solely dependent on whether I added the line:
(None, "blah.com:17828", "foo", "bar")
...to the HTTPPasswordMgrWithDefaultRealm object.
If that password set was added, then the password lookup for
the proxy was successful, and urlopen() worked. If that
password set was not included, then the password lookup for
the proxy was unsuccessful (despite the inclusion of the
other 2, similar, password sets - "http://blah.com:17828"
and "blah.com"), and urlopen() would fail. Hence my
suspicion that the password manager did not fully remove the
scheme, despite attempts to do so.
I'll see if I can set it up on the latest python and get it
to happen again.
Just as an explanation, the situation was that I was running
an authenticating proxy on a non-standard port (in order to
avoid clashing with the normal proxy), in order to test out
how my download code would work through an authenticating proxy.
----------------------------------------------------------------------
Comment By: John J Lee (jjlee)
Date: 2003-12-01 00:14
Message:
Logged In: YES
user_id=261020
The problem seems to be with the port (:17828), not the URL
scheme (http:), because HTTPPasswordMgr.reduce_uri()
removes the scheme.
RFC 2617 (top of page 3) says nothing about removing the
port from the URI. urllib2 does not remove the port, so this
doesn't appear to be a bug.
I guess gazzadee was doing a urlopen with a different
canonical root URI (RFC 2617, top of page 3 again) to the one
he gave in add_password (ie. the URL he passed to urlopen()
had no explicit port number).
----------------------------------------------------------------------
Comment By: GaryD (gazzadee)
Date: 2003-02-09 23:17
Message:
Logged In: YES
user_id=693152
Okay, the same problem crops up in Python 2.2.2 running
under cygwin on Win XP
Version Info:
Python 2.2.2 (#1, Dec 31 2002, 12:24:34)
[GCC 3.2 20020927 (prerelease)] on cygwin
Here's the pertinent section of my test file (passwords and
URL changed to protect the innocent):
# Setup proxy
proxy_handler = ProxyHandler({"http" :
"http://blah.com:17828"})
# Setup authentication
pass_mgr = HTTPPasswordMgrWithDefaultRealm()
for passwd in [ \
(None, "http://blah.com:17828", "foo",
"bar"), \
# (None, "blah.com:17828", "foo",
"bar"), \ # Works if this line is uncommented
(None, "blah.com", "foo", "bar"), \
]:
print("Adding password set (%s, %s, %s, %s)" % passwd)
pass_mgr.add_password(*passwd)
auth_handler = HTTPBasicAuthHandler(pass_mgr)
proxy_auth_handler = ProxyBasicAuthHandler(pass_mgr)
# Now build a new URL opener and install it
opener = build_opener(proxy_handler, proxy_auth_handler,
auth_handler, HTTPHandler)
install_opener(opener)
# Now try to open a file and see what happens
request = Request("http://www.google.com")
try:
remotefile = urlopen(request)
except HTTPError, ex:
print("Unable to download file due to HTTP Error %d
(%s)." % (ex.code, ex.msg))
return
----------------------------------------------------------------------
Comment By: Gerhard Häring (ghaering)
Date: 2003-02-07 23:21
Message:
Logged In: YES
user_id=163326
Can you please retry with Python 2.2.2?
It seems that a related bug was fixed for 2.2.2:
http://python.org/2.2.2/NEWS.txt has an entry:
"""
- In urllib2.py: fix proxy config with user+pass
authentication. [SF
patch 527518]
"""
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=680577&group_id=5470
More information about the Python-bugs-list
mailing list