[Doc-SIG] What counts as a url?
M.-A. Lemburg
mal@lemburg.com
Fri, 16 Mar 2001 18:08:37 +0100
"Edward D. Loper" wrote:
>
> So I'm working on adding HREFs to STminus. They look like this::
>
> "anchor name":URL
>
> Where URL is either a relative URL or an absolute URL.. So I went
> and looked up "RFC 2396":http://www.w3.org/Addressing/rfc2396.txt .
> It suggests (if I'm reading it correctly) that we could define
> a URL as::
>
> ([a-zA-Z0-9-_.!~*'();/?:@&=+$,#] | %[0-9a-fA-F][0-9a-fA-F])+
>
> Should we use that regexp for URLs? Or perhaps we should go for
> simplicitly, and say that the regexp ends at whitespace::
>
> [^\s]+
>
> In either case, we'll have to be careful to say::
>
> See "this":http://url .
>
> instead of::
>
> See "this":http://url.
>
> (the '.' gets included in the second url). Is that a problem? If
> so, what can we do about it? (Keep in mind that it *is* acceptable
> to have a URL that ends in a '.')..
>
> Of course, I don't think people will be including HREFs in their
> documentation much, anyway.. So the main issue for most people
> will just be that they can't use '":' in certain environments..
>
> Ideas/thoughts?
FYI, I use this RE in my apps:
r'\b((?:http|ftp|https|mailto)://[\w@&#-_.!~*();]+\b/?)'
I don't think it makes sense to include schemes which are not
supported by your everyday browser, so only the most common ones
are included.
--
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting: http://www.egenix.com/
Python Pages: http://www.lemburg.com/python/