[Python-Dev] Some more comments re new uriparse module, patch 1462525

Fri Jun 9 06:54:18 CEST 2006

John J Lee wrote:
> > http://python.org/sf/1500504
> [...]
> 
> At first glance, looks good.  I hope to review it properly later.
> 
> One point: I don't think there should be any mention of "URL" in the 
> module -- we should use "URI" everywhere (see my comments on Paul's 
> original version for a bit more on this).

Agreed.

Although you've added the test cases from 4Suite and credited me for them,
only a few of the test cases were invented by me.  I'd rather you credited
them to their original sources, as I did.

Also, I believe Graham Klyne has been adding some new cases to his Haskell
tools, but hasn't been updating the other spreadsheet and RDF files in which 
he publishes them in a more usable form. My tests only use what's in the 
spreadsheet, so I've only got 88 out of 99 "testRelative" cases from
http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/libraries/network/tests/URITest.hs
So if you really want to be thorough, grab the missing cases from there.

-

It appears that Paul uploaded a new version of his library on June 3:
http://python.org/sf/1462525
I'm unclear on the relationship between the two now. Are they both up for 
consideration?

-

One thing I forgot to mention in private email is that I'm concerned that the
inclusion of URI reference resolution functionality has exceeded the scope of
this 'urischemes' module that purports to be for 'extensible URI parsing'.  It
is becoming a scheme-aware and general-purpose syntactic processing library
for URIs, and should be characterized as such in its name as well as in its
documentation. 

Even without a new name and more accurately documented scope, people are going
to see no reason not to add the rest of STD 66's functionality to it
(percent-encoding, normalization for testing equivalence, syntax
validation...). As you can see in Ft.Lib.Uri, the latter two are not at all
hard to implement, especially if you use regular expressions. These all fall 
under syntactic operations on URIs, just like reference-resolution.

Percent-encoding gets very hairy with its API details due to application-level
uses that don't jive with STD 66 (e.g. the fuzzy specs and convoluted history
governing application/x-www-form-urlencoded), the nuances of character
encoding and Python string types, and widely varying expectations of users.