Does python's minidom support Chinese?

Uche Ogbuji uche at
Mon Mar 15 05:09:45 CET 2004

Anthony Liu <antonyliu2002 at> wrote in message news:<mailman.250.1078955721.19534.python-list at>...
> The following 4 lines of code parses an XML document
> very well if the XML document contains only English
> words.
> But when I insert one Chinese character into the XML
> document, then Python starts to complain when it hits
> the Chinese character, saying that it is an invalid
> token and thus it is not well-formed.
> This is the complaint of Python:
> ExpatError: not well-formed (invalid token): line 3,
> column 7
> line 3 and column 7 exactly pinpoints the 1st Chinese
> character in the XML document.

This is an XML problem on your end, not a minidom problem.  That error
probably means that you are either omitting the XML declaration (and
thus defaulting to UTF-8 or UTF-16) or declaring a bogus encoding.

> The problem remains even if I try encoding="UTF-16" or
> encoding="GB2312" or encoding="GBK" in the xml
> document.

Well, you can't just go shopping about for oare it accordingly.

Back to minidom: even after you fix your XML problems you may still
have trouble with minidom because the expat reader has to understand
the encoding you're using.  I think that it may use the Python codecs
model to find the encoding you declared, so you may just need to
install a Python Chinese codecs package, and you'll be all set.  I'm
not entirely sure this si the case, though.


More information about the Python-list mailing list