[Doc-SIG] What counts as a url?

Tony J Ibbs (Tibs) tony@lsl.co.uk
Fri, 16 Mar 2001 11:23:19 -0000


Edward D. Loper wrote:
> So I'm working on adding HREFs to STminus.  They look like this::
>
>     "anchor name":URL

OK.

> Where URL is either a relative URL or an absolute URL..  So I went
> and looked up "RFC 2396":http://www.w3.org/Addressing/rfc2396.txt .
> It suggests (if I'm reading it correctly) that we could define
> a URL as::
>
>     ([a-zA-Z0-9-_.!~*'();/?:@&=+$,#] | %[0-9a-fA-F][0-9a-fA-F])+

Ah - do we want URLs or URIs? I can never remember the difference.

I am loathe to stop people from using the full generality of "pointers
to the web", and this means delving into nasty stuff. See

	http://www.foad.org/~abigail/Perl/url2.html

for some interesting details. I think we need to avoid that.

> Should we use that regexp for URLs?  Or perhaps we should go for
> simplicitly, and say that the regexp ends at whitespace::
>
>     [^\s]+
>
> In either case, we'll have to be careful to say::
>
>     See "this":http://url .
>
> instead of::
>
>     See "this":http://url.

Hmm, that breaks with ST tradition, and indeed my code treats that final
"." as not being part of the URI. Hmm.

> Is that a problem?  If so, what can we do about it?
> (Keep in mind that it *is* acceptable to have a URL that ends in a
'.')..

I'll think on it, for my part (and read some specs).

> Of course, I don't think people will be including HREFs in their
> documentation much, anyway.. So the main issue for most people
> will just be that they can't use '":' in certain environments..

Erm, I wouldn't bet on that. And we *are* trying to retain
compatibility/usefulness as a tool for working on text files as well,
remember, where this sort of thing is more likely.

Tibs (slightly worriedly)

--
Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
"How fleeting are all human passions compared with the massive
continuity of ducks." - Dorothy L. Sayers, "Gaudy Night"
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)