[New-bugs-announce] [issue9374] urlparse should parse query and fragment for arbitrary schemes

Nick Welch report at bugs.python.org
Sun Jul 25 00:58:41 CEST 2010

New submission from Nick Welch <mackstann at gmail.com>:

While the netloc/path parts of URLs are scheme-specific, and urlparse can be forgiven for refusing to parse them for unknown schemes, the query and fragment parts are standardized, and should be parsed for unrecognized schemes.

According to Wikipedia:
Internet standard STD 66 (also RFC 3986) defines the generic syntax to be used in all URI schemes. Every URI is defined as consisting of four parts, as follows:
<scheme name> : <hierarchical part> [ ? <query> ] [ # <fragment> ]

Here is a demonstration of what urlparse currently does:

>>> urlparse.urlsplit('myscheme://netloc/path?a=b#frag')
SplitResult(scheme='myscheme', netloc='', path='//netloc/path?a=b#frag', query='', fragment='')

>>> urlparse.urlsplit('http://netloc/path?a=b#frag')
SplitResult(scheme='http', netloc='netloc', path='/path', query='a=b', fragment='frag')

components: Library (Lib)
messages: 111511
nosy: Nick.Welch
priority: normal
severity: normal
status: open
title: urlparse should parse query and fragment for arbitrary schemes
type: behavior
versions: Python 2.6

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list