HTML Parser

Kragen Sitaker kragen at dnaco.net
Sun Dec 31 00:02:52 EST 2000


In article <qnkito1p9jj.fsf at arbutus.physics.mcmaster.ca>,
David M. Cooke <cookedm at physics.mcmaster.ca> wrote:
>At some point, kragen at dnaco.net (Kragen Sitaker) wrote:
>> In a string like "x<a>b<c>d", this will match "<a>b<c>", because the .*
>> matches "a>b<c".  This explains your problem.
>> 
>> Fixing it is harder.
>
>Not that hard: use the pattern '<.*?>'.

Well, he wants to upcase his tag names; this will still match the
entire attribute name and all attribute values, so his URLs will get
upcased.  This is bad, and fixing it *is* harder.

-- 
<kragen at pobox.com>       Kragen Sitaker     <http://www.pobox.com/~kragen/>
Perilous to all of us are the devices of an art deeper than we possess
ourselves.
       -- Gandalf the White [J.R.R. Tolkien, "The Two Towers", Bk 3, Ch. XI]





More information about the Python-list mailing list