[Tutor] Re: Regex

Andrei project5 at redrival.net
Mon Aug 25 21:01:58 EDT 2003


Thanks, it *almost* helps, but I'm not trying to harvest the links. The 
issue is that I do *not* want to get URLs if they're in between <a> 
tags, nor if they're an attribute to some tag (img, a, link, whatever).

Perhaps I should have explained my goal more clearly: I wish to take a 
piece of text which may or may not contain HTML tags and turn any piece 
of text which is NOT a link, but is an URL into a link. E.g.:

   go to <a href="http://home.com">http://home.com</a>. [1]
   go <a href="http://home.com">home</a>. [2]

should remain unmodified, but

   go to http://home.com [3]

should be turned into [1]. That negative lookbehind can do the job in 
the large majority of the cases (by not matching URLs if they're 
preceded by single or double quotes or by ">"), but not always since it 
doesn't allow the lookbehind to be non-fixed length. I think one of the 
parser modules might be able to help (?) but regardless of how much I 
try, I can't get the hang of them, while I do somewhat understand regexes.

Andrei





More information about the Tutor mailing list