Attention, hyperlinkers: inference of active text
nelson at monkey.org
Sat Jun 19 17:38:16 CEST 2004
If I understand your question correctly, you're looking for a way to
guess what part of an English sentence is a URL. The problem you're
facing is trailing punctuation characters.
Ie, these are good:
Look at http://bamboo.org !
It is on my drive as file:\Program%20Files\Perl\odysseus.exe
And these are bad:
Look at http://bamboo.org!
The secret is in "file:\My Download Folder\dont_look.txt".
If you want to make life as easy as possible for your authors, you
need some good heuristics. You need to guess where the URL starts and
ends. My terminal emulator (SecureCRT) does a pretty good job of this.
Nat Friedman's dingus also did this trick awhile ago - I can't find it
easily now, but I think the code might be part of rxvt or Gnome.
Your other option is to require folks to delimit URLs with something
like <http://bamboo.org>. This is pretty painless and common, but only
you can know whether your users will accept it.
More information about the Python-list