ElementTree cannot parse UTF-8 Unicode?
Fredrik Lundh
fredrik at pythonware.com
Wed Jan 19 15:50:57 EST 2005
Erik Bethke wrote:
> I am getting an error of not well-formed at the beginning of the Korean
> text in the second example. I am doing something wrong with how I am
> encoding my Korean? Do I need more of a wrapper about it than simple
> quotes? Is there some sort of XML syntax for indicating a Unicode
> string, or does the Elementree library just not support reading of
> Unicode?
XML is Unicode, and ElementTree supports all common encodings just
fine (including UTF-8).
> this one fails:
> <?xml version="1.0" encoding="UTF-8"?>
> <Vocab>
> <Word L1="?????!"></Word>
> </Vocab>
this works just fine on my machine.
what's the exact error message?
what does
print repr(open("test2.xml").read())
print on your machine?
what happens if you attempt to parse
<Vocab>
<Word L1="어녕하세요!" />
</Vocab>
?
</F>
More information about the Python-list
mailing list