[New-bugs-announce] [issue33480] Improvement suggestions for urllib.parse.urlparser
brent s.
report at bugs.python.org
Sun May 13 04:11:58 EDT 2018
New submission from brent s. <brent.saner at gmail.com>:
Currently, a parsed urlparse() object looks (roughly) like this:
urlparse('http://example.com/foo;key1=value1?key2=value2#key3=value3#key4=value4')
returns:
ParseResult(scheme='http', netloc='example.com', path='/foo', params='key1=value1', query='key2=value2', fragment='key3=value3#key4=value4')
However, I recommend a couple things:
0.) that ParseResult objects support dict emulation. e.g. one can run:
dict(parseresult_obj)
and get (using the example string above (corrected classification for RFC2986 compliance and common usage):
{'fragment': [('key4', 'value4')],
'netloc': 'foo.tld',
'params': [('key2', 'value2')],
'path': '/foo',
'query': [('key3', 'value3')],
'scheme': 'http'}
Obviously, fragment, params, and query could instead be serialized into a nested dict. I'm not sure which is more preferred in the pythonic sense.
1.) Better RFC3986 compliance.
Per RFC3986 § 3 (https://tools.ietf.org/html/rfc3986#section-3), the URL can be further split into separate components. For instance, while considered deprecated, should "userinfo" (e.g. "http://user:password@...") be parsed? At the very least, the port should be parsed out to a separate component from the netloc (or userinfo parsed out separate from netloc) - this will assist in parsing host:port combinations in netlocs that contain both userinfo and a specified port (and allow the port to be given as an int type, thus more easily used in e.g. the socket lib).
2.) If a component is not present, I suggest it be a None object instead of an empty string.
e.g.:
urlparse('http://example.com/foo')
Would return:
ParseResult(scheme='http', netloc='example.com', path='/foo', params=None, query=None, fragment=None)
instead of
ParseResult(scheme='http', netloc='example.com', path='/foo', params='', query='', fragment='')
----------
components: Library (Lib)
messages: 316454
nosy: bsaner
priority: normal
severity: normal
status: open
title: Improvement suggestions for urllib.parse.urlparser
type: enhancement
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue33480>
_______________________________________
More information about the New-bugs-announce
mailing list