[Python-checkins] r83210 - in python/branches/release31-maint: Lib/test/test_robotparser.py Lib/urllib/robotparser.py

senthil.kumaran python-checkins at python.org
Wed Jul 28 18:30:46 CEST 2010


Author: senthil.kumaran
Date: Wed Jul 28 18:30:46 2010
New Revision: 83210

Log:
Merged revisions 83209 via svnmerge from 
svn+ssh://pythondev@svn.python.org/python/branches/py3k

........
  r83209 | senthil.kumaran | 2010-07-28 21:57:56 +0530 (Wed, 28 Jul 2010) | 3 lines
  
  Fix Issue6325 - robotparse to honor urls with query strings.
........


Modified:
   python/branches/release31-maint/   (props changed)
   python/branches/release31-maint/Lib/test/test_robotparser.py
   python/branches/release31-maint/Lib/urllib/robotparser.py

Modified: python/branches/release31-maint/Lib/test/test_robotparser.py
==============================================================================
--- python/branches/release31-maint/Lib/test/test_robotparser.py	(original)
+++ python/branches/release31-maint/Lib/test/test_robotparser.py	Wed Jul 28 18:30:46 2010
@@ -205,6 +205,17 @@
 RobotTest(13, doc, good, bad, agent="googlebot")
 
 
+# 14. For issue #6325 (query string support)
+doc = """
+User-agent: *
+Disallow: /some/path?name=value
+"""
+
+good = ['/some/path']
+bad = ['/some/path?name=value']
+
+RobotTest(14, doc, good, bad)
+
 
 class NetworkTestCase(unittest.TestCase):
 

Modified: python/branches/release31-maint/Lib/urllib/robotparser.py
==============================================================================
--- python/branches/release31-maint/Lib/urllib/robotparser.py	(original)
+++ python/branches/release31-maint/Lib/urllib/robotparser.py	Wed Jul 28 18:30:46 2010
@@ -129,8 +129,10 @@
             return True
         # search for given user agent matches
         # the first match counts
-        url = urllib.parse.quote(
-            urllib.parse.urlparse(urllib.parse.unquote(url))[2])
+        parsed_url = urllib.parse.urlparse(urllib.parse.unquote(url))
+        url = urllib.parse.urlunparse(('','',parsed_url.path,
+            parsed_url.params,parsed_url.query, parsed_url.fragment))
+        url = urllib.parse.quote(url)
         if not url:
             url = "/"
         for entry in self.entries:


More information about the Python-checkins mailing list