ElementTree cannot parse UTF-8 Unicode?

Erik Bethke erikbethke at gmail.com
Wed Jan 19 10:54:21 EST 2005


Hello All,

I am getting an error of not well-formed at the beginning of the Korean
text in the second example.  I am doing something wrong with how I am
encoding my Korean?  Do I need more of a wrapper about it than simple
quotes?  Is there some sort of XML syntax for indicating a Unicode
string, or does the Elementree library just not support reading of
Unicode?

here is my test snippet:

from elementtree import ElementTree
vocabXML = ElementTree.parse('test2.xml').getroot()

where I have two data files:

this one works:
<?xml version="1.0" encoding="UTF-8"?>
<Vocab>
<Word L1='Hahha'></Word>
</Vocab>

this one fails:
<?xml version="1.0" encoding="UTF-8"?>
<Vocab>
    <Word L1="어녕하세요!"></Word>
</Vocab>




More information about the Python-list mailing list