elementtree w/utf8
Marc 'BlackJack' Rintsch
bj_666 at gmx.net
Thu Oct 25 17:34:05 EDT 2007
On Thu, 25 Oct 2007 17:15:36 -0400, Tim Arnold wrote:
> Hi, I'm getting the by-now-familiar error:
> return codecs.charmap_decode(input,errors,decoding_map)
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position
> 4615: ordinal not in range(128)
>
> the html file I'm working with is in utf-8, I open it with codecs, try to
> feed it to TidyHTMLTreeBuilder, but no luck. Here's my code:
> from elementtree import ElementTree as ET
> from elementtidy import TidyHTMLTreeBuilder
>
> fd = codecs.open(htmfile,encoding='utf-8')
> tidyTree =
> TidyHTMLTreeBuilder.TidyHTMLTreeBuilder(encoding='utf-8')
> tidyTree.feed(fd.read())
> self.tree = tidyTree.close()
> fd.close()
>
> what am I doing wrong? Thanks in advance.
You feed decoded data to `TidyHTMLTreeBuilder`. As the `encoding`
argument suggests this class wants bytes not unicode. Decoding twice
doesn't work.
Ciao,
Marc 'BlackJack' Rintsch
More information about the Python-list
mailing list