[issue5072] urllib.open sends full URL after GET command instead of local path
Olemis Lang
report at bugs.python.org
Mon Jan 26 20:22:54 CET 2009
New submission from Olemis Lang <olemis at gmail.com>:
Hello ...
The first thing I have to say is that I searched the open issues and I
found nothing similar to what I am going to report hereinafter. If this
ticket is duplicate , I apologize ...
Yesterday I was testing how to access the wiki pages in a
Trac [1]_ site and I realized that something wrong was happening
(a bug? ...)
Initially the behavior was as follows :
{{{
#!python
>>> u = urllib.urlopen('http://localhost:8000/trac-dev')
>>> u.read()
'Environment not found'
>>> u.close()
}}}
And tracd reported a line like this
{{{
127.0.0.1 - - [25/Jan/2009 17:32:08] "GET http://localhost:8000/trac-
dev HTTP/1.0" 404 -
}}}
Which means that a 'Not found' error code was sent back to urllib
client.
I tried to access the same page from my browser and tracd reported
{{{
127.0.0.1 - - [25/Jan/2009 18:05:44] "GET /trac-dev HTTP/1.0" 200 -
}}}
The problem is obvious ... urllib was sending the full URL after GET
and it should send only the string after the network location.
I applied the following patch to urllib (yours will be better, I am
sure about that ;)
{{{
#!diff
--- /usr/lib/python2.5/urllib.py 2008-07-31 13:40:40.000000000
-0500
+++ /media/urllib_unix.py 2009-01-26 09:48:54.000000000 -0500
@@ -270,6 +270,7 @@
def open_http(self, url, data=None):
"""Use HTTP protocol."""
import httplib
+ from urlparse import urlparse
user_passwd = None
proxy_passwd= None
if isinstance(url, str):
@@ -312,12 +313,17 @@
else:
auth = None
h = httplib.HTTP(host)
+ target = ''.join(sep + part for sep, part in \
+ zip(['', ';', '?', '#'], \
+ urlparse(selector)[2:]) \
+ if part)
+ print target
if data is not None:
- h.putrequest('POST', selector)
+ h.putrequest('POST', target)
h.putheader('Content-Type', 'application/x-www-form-
urlencoded')
h.putheader('Content-Length', '%d' % len(data))
else:
- h.putrequest('GET', selector)
+ h.putrequest('GET', target)
if proxy_auth: h.putheader('Proxy-Authorization', 'Basic %s' %
proxy_auth)
if auth: h.putheader('Authorization', 'Basic %s' % auth)
if realhost: h.putheader('Host', realhost)
}}}
And everithing was «back» to normal ...
{{{
#!python
>>> u = urllib.urlopen('http://localhost:8000/trac-dev')
>>> u.read()
... # Lots of beautiful HTML code ;)
>>> u.close()
}}}
... tracd outputted ...
{{{
127.0.0.1 - - [25/Jan/2009 18:05:44] "GET /trac-dev HTTP/1.0" 200 -
}}}
The same picture is shown when using both Python 2.5.1 and 2.5.2 ...
I have not installed Python 2.6.x so I am not sure about whether this
issue has propagated onto newer versions of Python ... and I don't
know euther if this issue is also present in urllib2 or not ...
... so further research is needed, but IMO this is a serious bug :(
PD: If this is a bug ... how could it be hidden so far ? Is there any
test case written to assert this kind of things ? I checked out
`test.test_urllib` and `test.test_urllibnet` modules and I saw
nothing at all ...
.. [1] Trac
(http://trac.edgewall.org)
----------
components: Library (Lib)
messages: 80586
nosy: olemis
severity: normal
status: open
title: urllib.open sends full URL after GET command instead of local path
type: behavior
versions: Python 2.5
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5072>
_______________________________________
More information about the Python-bugs-list
mailing list