xHTML/XML to Unicode (and back)
Fredrik Lundh
fredrik at pythonware.com
Tue Jan 24 08:46:46 EST 2006
Robin Haswell wrote:
> I'm currently screenscraping some Swedish site, and i need a method to
> convert XML entities (& etc, plus d etc) to Unicode characters.
> I'm sure one of python's myriad of XML processors can do this but I can't
> find which one.
>
> Can anyone make any suggestions?
any decent html-aware screen scraper library should be able to do
this for you.
if you've already extracted the strings, the strip_html function on
this page might be what you need:
http://effbot.org/zone/re-sub.htm#strip-html
</F>
More information about the Python-list
mailing list