[Python-Dev] Finally switch urllib.parse to RFC3986 semantics?

Guido van Rossum guido at python.org
Fri Mar 18 03:49:04 CET 2011


On Wed, Mar 16, 2011 at 5:02 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Tue, Mar 15, 2011 at 11:34 PM, Guido van Rossum <guido at python.org> wrote:
>>
>> Can you be specific? What is different between those RFCs?
>
> I finally got around to trying to backport some of the additional
> urljoin tests from http://bugs.python.org/issue1500504 (specifically,
> the additional ones Mike Brown provided), but got tripped up by the
> behavioural changes between the earlier RFCs and RFC 3986 regarding
> the way ".." is handled.

Ah, got it.

> Even in test_urlparse, a bunch of the normative tests from RFC 3986
> are commented out because they fail (by design) when run through
> urllib.parse.urljoin. Some of the additional tests also fail because
> our urljoin implementation has a whitelist of schemas that support
> relative references, whereas 3986 expects relative references to work
> for unknown schemas as well.
>
> There's actually quite a few more terminology changes as well (as
> Senthil pointed out in his email), but it was specifically the failing
> test cases for urljoin semantics that bit me again yesterday.
>
> The problem is that it is quite a lot of work to get fully general URI
> parsing to work correctly, but the overlap with legacy URL parsing is
> large enough that many (most?) use cases in practice work just fine
> with the older RFC semantics.

So would having two different API functions, one legacy and one
conforming, be a problem? Ideally the conforming API's name would not
be something lame like urllib2 but something timeless. :-)

-- 
--Guido van Rossum (python.org/~guido)


More information about the Python-Dev mailing list