[issue5072] urllib.open sends full URL after GET command instead of local path

Olemis Lang report at bugs.python.org
Mon Jan 26 20:22:54 CET 2009


New submission from Olemis Lang <olemis at gmail.com>:

Hello ... 

The first thing I have to say is that I searched the open issues and I 
found nothing similar to what I am going to report hereinafter. If this 
ticket is duplicate , I apologize ...

Yesterday I was testing how to access the wiki pages in a 
Trac [1]_ site and I realized that something wrong was happening 
(a bug? ...)

Initially the behavior was as follows :

{{{
#!python
>>> u = urllib.urlopen('http://localhost:8000/trac-dev')
>>> u.read()
'Environment not found'
>>> u.close()
}}}

And tracd reported a line like this 

{{{
127.0.0.1 - - [25/Jan/2009 17:32:08] "GET http://localhost:8000/trac-
dev HTTP/1.0" 404 -
}}}

Which means that a 'Not found' error code was sent back to urllib 
client.

I tried to access the same page from my browser and tracd reported

{{{
127.0.0.1 - - [25/Jan/2009 18:05:44] "GET /trac-dev HTTP/1.0" 200 -
}}}

The problem is obvious ... urllib was sending the full URL after GET
and it should send only the string after the network location.

I applied the following patch to urllib (yours will be better, I am 
sure about that ;)

{{{
#!diff

--- /usr/lib/python2.5/urllib.py        2008-07-31 13:40:40.000000000 
-0500
+++ /media/urllib_unix.py     2009-01-26 09:48:54.000000000 -0500
@@ -270,6 +270,7 @@
     def open_http(self, url, data=None):
         """Use HTTP protocol."""
         import httplib
+        from urlparse import urlparse
         user_passwd = None
         proxy_passwd= None
         if isinstance(url, str):
@@ -312,12 +313,17 @@
         else:
             auth = None
         h = httplib.HTTP(host)
+        target = ''.join(sep + part for sep, part in \
+                                zip(['', ';', '?', '#'], \
+                                    urlparse(selector)[2:]) \
+                                if part)
+        print target
         if data is not None:
-            h.putrequest('POST', selector)
+            h.putrequest('POST', target)
             h.putheader('Content-Type', 'application/x-www-form-
urlencoded')
             h.putheader('Content-Length', '%d' % len(data))
         else:
-            h.putrequest('GET', selector)
+            h.putrequest('GET', target)
         if proxy_auth: h.putheader('Proxy-Authorization', 'Basic %s' % 
proxy_auth)
         if auth: h.putheader('Authorization', 'Basic %s' % auth)
         if realhost: h.putheader('Host', realhost)


}}}

And everithing was «back» to normal ...

{{{
#!python
>>> u = urllib.urlopen('http://localhost:8000/trac-dev')
>>> u.read()
    ... # Lots of beautiful HTML code ;)
>>> u.close()
}}}

... tracd outputted ...

{{{
127.0.0.1 - - [25/Jan/2009 18:05:44] "GET /trac-dev HTTP/1.0" 200 -
}}}

The same picture is shown when using both Python 2.5.1 and 2.5.2 ...
I have not installed Python 2.6.x so I am not sure about whether this
issue has propagated onto newer versions of Python ... and I don't 
know euther if this issue is also present in urllib2 or not ...

... so further research is needed, but IMO this is a serious bug :(

PD: If this is a bug ... how could it be hidden so far ? Is there any 
    test case written to assert this kind of things ? I checked out 
    `test.test_urllib` and `test.test_urllibnet` modules and I saw
    nothing at all ... 

.. [1] Trac
       (http://trac.edgewall.org)

----------
components: Library (Lib)
messages: 80586
nosy: olemis
severity: normal
status: open
title: urllib.open sends full URL after GET command instead of local path
type: behavior
versions: Python 2.5

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5072>
_______________________________________


More information about the Python-bugs-list mailing list