[Python-Dev] [RFC] urlparse - parse query facility
Senthil Kumaran
orsenthil at gmail.com
Tue Jun 12 22:39:10 CEST 2007
Hi all,
This mail is a request for comments on changes to urlparse module. We understand
that urlparse returns the 'complete query' value as the query
component and does not
provide the facilities to separate the query components. User will have to use
the cgi module (cgi.parse_qs) to get the query parsed.
There has been a discussion in the past, on having a method of parse query
string available from urlparse module itself. [1]
To implement the query parse feature in urlparse module, we can:
a) import cgi and call cgi module's query_ps.
This approach will have problems as it
i) imports cgi for urlparse module.
ii) cgi module in turn imports urllib and urlparse.
b) Implement a stand alone query parsing facility in urlparse *AS IN*
cgi module.
Below method implements the urlparse_qs(url, keep_blank_values,strict_parsing)
that will help in parsing the query component of the url. It behaves same as the
cgi.parse_qs.
Please let me know your comments on the below code.
----------------------------------------------------------------------
def unquote(s):
"""unquote('abc%20def') -> 'abc def'."""
res = s.split('%')
for i in xrange(1, len(res)):
item = res[i]
try:
res[i] = _hextochr[item[:2]] + item[2:]
except KeyError:
res[i] = '%' + item
except UnicodeDecodeError:
res[i] = unichr(int(item[:2], 16)) + item[2:]
return "".join(res)
def urlparse_qs(url, keep_blank_values=0, strict_parsing=0):
"""Parse a URL query string and return the components as a dictionary.
Based on the cgi.parse_qs method.This is a utility function provided
with urlparse so that users need not use cgi module for
parsing the url query string.
Arguments:
url: URL with query string to be parsed
keep_blank_values: flag indicating whether blank values in
URL encoded queries should be treated as blank strings.
A true value indicates that blanks should be retained as
blank strings. The default false value indicates that
blank values are to be ignored and treated as if they were
not included.
strict_parsing: flag indicating what to do with parsing errors.
If false (the default), errors are silently ignored.
If true, errors raise a ValueError exception.
"""
scheme, netloc, url, params, querystring, fragment = urlparse(url)
pairs = [s2 for s1 in querystring.split('&') for s2 in s1.split(';')]
query = []
for name_value in pairs:
if not name_value and not strict_parsing:
continue
nv = name_value.split('=', 1)
if len(nv) != 2:
if strict_parsing:
raise ValueError, "bad query field: %r" % (name_value,)
# Handle case of a control-name with no equal sign
if keep_blank_values:
nv.append('')
else:
continue
if len(nv[1]) or keep_blank_values:
name = unquote(nv[0].replace('+', ' '))
value = unquote(nv[1].replace('+', ' '))
query.append((name, value))
dict = {}
for name, value in query:
if name in dict:
dict[name].append(value)
else:
dict[name] = [value]
return dict
----------------------------------------------------------------------
Testing:
$ python
Python 2.6a0 (trunk, Jun 10 2007, 12:04:03)
[GCC 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urlparse
>>> dir(urlparse)
['BaseResult', 'MAX_CACHE_SIZE', 'ParseResult', 'SplitResult', '__all__',
'__builtins__', '__doc__', '__file__', '__name__', '_parse_cache',
'_splitnetloc', '_splitparams', 'clear_cache', 'non_hierarchical',
'scheme_chars', 'test', 'test_input', 'unquote', 'urldefrag', 'urljoin',
'urlparse', 'urlparse_qs', 'urlsplit', 'urlunparse', 'urlunsplit',
'uses_fragment', 'uses_netloc', 'uses_params', 'uses_query', 'uses_relative']
>>> URL =
>>> 'http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=utf-8&q=south+africa+travel+cape+town'
>>> print urlparse.urlparse_qs(URL)
{'q': ['south africa travel cape town'], 'oe': ['utf-8'], 'ie': ['UTF-8'],
'hl': ['en']}
>>> print urlparse.urlparse_qs(URL,keep_blank_values=1)
{'q': ['south africa travel cape town'], 'ie': ['UTF-8'], 'oe': ['utf-8'],
'lr': [''], 'hl': ['en']}
>>>
Thanks,
Senthil
[1] http://mail.python.org/pipermail/tutor/2002-August/016823.html
--
O.R.Senthil Kumaran
http://phoe6.livejournal.com
More information about the Python-Dev
mailing list