xHTML/XML to Unicode (and back)
rob at digital-crocus.com
Tue Jan 24 09:34:34 EST 2006
On Tue, 24 Jan 2006 14:46:46 +0100, Fredrik Lundh wrote:
> Robin Haswell wrote:
>> I'm currently screenscraping some Swedish site, and i need a method to
>> convert XML entities (& etc, plus d etc) to Unicode characters.
>> I'm sure one of python's myriad of XML processors can do this but I can't
>> find which one.
>> Can anyone make any suggestions?
> any decent html-aware screen scraper library should be able to do
> this for you.
I'm using BeautifulSoup and it appears that it doesn't. I'd also like to
know the answer to this for when I do screenscraping with regular
> if you've already extracted the strings, the strip_html function on
> this page might be what you need:
More information about the Python-list