[issue1500504] Alternate RFC 3986 compliant URI parsing module
report at bugs.python.org
Wed Nov 3 22:47:36 CET 2010
Nick Coghlan <ncoghlan at gmail.com> added the comment:
Just to be clear, even *I* don't think adding urischemes as it stands is a particularly great idea, and I wrote it. The only reason I haven't closed the issue is because I'd like to see it mined for additional tests in test_urlparse and perhaps even implementation or API enhancements in url.parse first.
(The latter becomes a lot more likely if the urischemes implementation passes tests that url.parse fails)
I also think, since I wrote this, the various urllib parsing methods were updated to return named tuple instances with properties, so a lot of the awkwardness of extracting partial values went away. (i.e. returning structured objects already raised the level of the urllib APIs from the "tuple-of-strings" level they used to be sitting at)
I do still assert that urischemes is slightly "higher level" than the current incarnation of similar functionality in urllib.parse. Universal Resource Identifiers are more encompassing than Universal Resource Locators and Universal Resource Names, and the new APIs explicitly deal with both kinds of URI. There are subtle differences in the assumptions you're allowed to make when you may have a URN rather than a URL, so I believe the current module sometimes does the wrong thing when given one of the former.
That said, it's been a long time since I've needed to remember the details, so I don't recall exactly where the current module gets URI handling wrong (or at least, did back in 2006). The intro to RFC 3986 is a good place to start in learning the differences though - Sir Tim writes good docs :)
Python tracker <report at bugs.python.org>
More information about the Python-bugs-list