[Tutor] Re: Regex
Kirk Bailey
idiot1 at netzero.net
Tue Aug 26 21:38:14 EDT 2003
Prehaps I can help.
I am writing a wiki. something very much like this comes up in convertingt the
page into html for the recipient's browser. If one types
http://www.tinylist.org
into the page, it should be converted into a link
<a href="http://www.tinylist.org">http://www.tinylist.org</a>
is the result it renders.
But if it is parsing the page, and it finds somthing that appears to be an
operational link already, it should leave it alone. This is because a different
function turns some code into an image tag with a src declaration pointing at
the host's website, and this must not be broken.
The critical difference is one simple character. the " at the start of the address.
<img src="http://www.tinylist.org/images/wikinehesalogo2.gif">
http://www.tinylist.org/images/wikinehesalogo2.gif
The second is converted into a link. The first is disabled, by
turning the < and > into < and > respectively.
<.img src="http://www.tinylist.org/images/wikinehesalogo2.gif">
will not operate when examined by the browser. Morover, the link constructor
will not turn the address into a hotlink, because of the leading '"'.
This is the sourcecode of the program. Please feel free to steal as needed.
http://www.tinylist.org/wikinehesa.txt
To witness it in action, click this:
http;//www.tinylist.org/cgi-bin/wikinehesa.py
Notice there is a image displayed in the body of the page, as well as a hotlink
back to the main website. To examine the page's wikicode source, just click the
EDIT THIS PAGE button.
I hope this is of some help.
Andrei wrote:
> Thanks, it *almost* helps, but I'm not trying to harvest the links. The
> issue is that I do *not* want to get URLs if they're in between <a>
> tags, nor if they're an attribute to some tag (img, a, link, whatever).
>
> Perhaps I should have explained my goal more clearly: I wish to take a
> piece of text which may or may not contain HTML tags and turn any piece
> of text which is NOT a link, but is an URL into a link. E.g.:
>
> go to <a href="http://home.com">http://home.com</a>. [1]
> go <a href="http://home.com">home</a>. [2]
>
> should remain unmodified, but
>
> go to http://home.com [3]
>
> should be turned into [1]. That negative lookbehind can do the job in
> the large majority of the cases (by not matching URLs if they're
> preceded by single or double quotes or by ">"), but not always since it
> doesn't allow the lookbehind to be non-fixed length. I think one of the
> parser modules might be able to help (?) but regardless of how much I
> try, I can't get the hang of them, while I do somewhat understand regexes.
>
> Andrei
>
>
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>
--
--
end
Cheers!
Kirk D Bailey
+ think +
http://www.howlermonkey.net +-----+ http://www.tinylist.org
http://www.listville.net | BOX | http://www.sacredelectron.org
Thou art free"-ERIS +-----+ 'Got a light?'-Promethieus
+ think +
Fnord.
More information about the Tutor
mailing list