UTF8 & HTMLParser
Jan Danielsson
jan.danielsson at gmail.com
Thu Nov 30 23:34:31 EST 2006
Jan Danielsson wrote:
> Hello all,
>
> I'm writing a python script which fetches a HTML-page (using wget),
> and then parses the retrieved page using a custom htmllib HTMLParser.
>
> The page I fetch is encoded in utf8, and my text-handler currently
> looks like this:
>
> def handle_data(self, text):
> if self.inOption:
> self.currentName = text
>
> However, I would like to convert the "text" (which is utf8) to
> latin-1. How do I do that? I've been trying to figure it out for some
> time now, and I'm just getting frustrated. :-(
I should have mentioned: The problem appears to be that I can't seem
to find a way to make python understand that "text" (the above argument)
is in fact already utf-8.
--
Kind Regards,
Jan Danielsson
Te audire non possum. Musa sapientum fixa est in aure.
More information about the Python-list
mailing list