How to convert markup text to plain text in python?
python.list at tim.thechases.com
Fri Feb 1 17:54:19 CET 2008
>> Well, if all you want to do is remove everything from a "<" to a
>> ">", you can use
>> >>> s = "<B>Today</B> is <U>Friday</U>"
>> >>> import re
>> >>> r = re.compile('<[^>]*>')
>> >>> print r.sub('', s)
>> Today is Friday
[Tim's ramblings about pathological cases snipped]
> The real answer to this question is "learn how to use Beautiful Soup" --
> see http://www.crummy.com/software/BeautifulSoup/
Yes, for more pathological cases, BS does a great job of parsing
However, as BS isn't batteries-included [Aside: BS and pyparsing
are two common solutions to problems that would make great
additions to the standard library], using a RE to make a
best-effort guess is a good first approximation of a solution
without needing to download extra packages--no matter how useful
those extra packages may be.
More information about the Python-list