[XML-SIG] Re: Ideas for web/ package

Mike Brown mike@skew.org
Wed, 20 Feb 2002 01:17:20 -0700 (MST)

I really wish pipermail would put message IDs in the archives. I could've
properly threaded this.

OK, I have, as Uche mentioned, reviewed the discussion from last year on this
list (links to it are in Uri.py). I also have been following the URI related
discussion currently happening on xml-dev. This alerted me to the existence of
the JDK 1.4 URI class, which is *very* well documented. However, the notion of
creating an object for every URI is typical Java overkill, IMO, and not a
desirable option for us in 4Suite.

I also reviewed the last several months' worth of archives of the W3C's URI
list (formerly the IETF URI WG list), where there were some interesting
threads. http://lists.w3.org/Archives/Public/uri/

The reason I am posting is because Uche's post inviting people to scrutinize
4Suite's (Base)UriResolver class at
was somewhat premature.

What has happened is this:

1. I have recently enhanced 4Suite's Ft.Lib.Uri module to include functions
and Python regular-expressions [1] for performing strict validation on strings
purporting to be URIs or URI references. Feel free to look over this and offer 

2. I have also added a function for parsing a URI reference into its
components. This is based on the regex in appendix B of RFC 2396, but I took
the liberty of disambiguating the very poorly named 'path' component. Feel
free to look over this, as well.

3. On a private mailing list, I floated a proposal for rewriting Uche's
UriResolver class, which is in that Ft.Lib.Uri module, so that it would be
much more useful and subclassable. However, I have not yet undertaken the work
on this. It's next on my to-do list. So, if you're looking at the Uri.py link
that Uche posted, kindly ignore the BaseUriResolver class in it, as it is
going to become completely unrecognizable here in the next day or two.

What precipitated this review of our Uri module is the unfortunate situation
with 'file:' URI references, which often masquerade as baseless, invalid
opaque strings that are nearly impossible to resolve with any confidence.
I'm not certain I can address all the issues, but I'm taking a stab at at
least fleshing out a resolver API that will allow subclassing in the manner 
that I expect people will want to be able to subclass it.

   - Mike
  mike j. brown, fourthought.com  |  xml/xslt: http://skew.org/xml/
  denver/boulder, colorado, usa   |  personal: http://hyperreal.org/~mike/

 [1] Why do Python docs hyphenate 'regular-expression'?