[New-bugs-announce] [issue33480] Improvement suggestions for urllib.parse.urlparser

brent s. report at bugs.python.org
Sun May 13 04:11:58 EDT 2018


New submission from brent s. <brent.saner at gmail.com>:

Currently, a parsed urlparse() object looks (roughly) like this:

urlparse('http://example.com/foo;key1=value1?key2=value2#key3=value3#key4=value4')

returns:

ParseResult(scheme='http', netloc='example.com', path='/foo', params='key1=value1', query='key2=value2', fragment='key3=value3#key4=value4')

However, I recommend a couple things:

0.) that ParseResult objects support dict emulation. e.g. one can run:

        dict(parseresult_obj)

    and get (using the example string above (corrected classification for RFC2986 compliance and common usage):

        {'fragment': [('key4', 'value4')],
         'netloc': 'foo.tld',
         'params': [('key2', 'value2')],
         'path': '/foo',
         'query': [('key3', 'value3')],
         'scheme': 'http'}

    Obviously, fragment, params, and query could instead be serialized into a nested dict. I'm not sure which is more preferred in the pythonic sense.

1.) Better RFC3986 compliance.
    Per RFC3986 § 3 (https://tools.ietf.org/html/rfc3986#section-3), the URL can be further split into separate components. For instance, while considered deprecated, should "userinfo" (e.g. "http://user:password@...") be parsed? At the very least, the port should be parsed out to a separate component from the netloc (or userinfo parsed out separate from netloc) - this will assist in parsing host:port combinations in netlocs that contain both userinfo and a specified port (and allow the port to be given as an int type, thus more easily used in e.g. the socket lib).

2.) If a component is not present, I suggest it be a None object instead of an empty string.
    e.g.:

        urlparse('http://example.com/foo')

    Would return:

        ParseResult(scheme='http', netloc='example.com', path='/foo', params=None, query=None, fragment=None)

    instead of

        ParseResult(scheme='http', netloc='example.com', path='/foo', params='', query='', fragment='')

----------
components: Library (Lib)
messages: 316454
nosy: bsaner
priority: normal
severity: normal
status: open
title: Improvement suggestions for urllib.parse.urlparser
type: enhancement

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue33480>
_______________________________________


More information about the New-bugs-announce mailing list