Does python's minidom support Chinese?
uche at ogbuji.net
Mon Mar 15 05:09:45 CET 2004
Anthony Liu <antonyliu2002 at yahoo.com> wrote in message news:<mailman.250.1078955721.19534.python-list at python.org>...
> The following 4 lines of code parses an XML document
> very well if the XML document contains only English
> But when I insert one Chinese character into the XML
> document, then Python starts to complain when it hits
> the Chinese character, saying that it is an invalid
> token and thus it is not well-formed.
> This is the complaint of Python:
> ExpatError: not well-formed (invalid token): line 3,
> column 7
> line 3 and column 7 exactly pinpoints the 1st Chinese
> character in the XML document.
This is an XML problem on your end, not a minidom problem. That error
probably means that you are either omitting the XML declaration (and
thus defaulting to UTF-8 or UTF-16) or declaring a bogus encoding.
> The problem remains even if I try encoding="UTF-16" or
> encoding="GB2312" or encoding="GBK" in the xml
Well, you can't just go shopping about for oare it accordingly.
Back to minidom: even after you fix your XML problems you may still
have trouble with minidom because the expat reader has to understand
the encoding you're using. I think that it may use the Python codecs
model to find the encoding you declared, so you may just need to
install a Python Chinese codecs package, and you'll be all set. I'm
not entirely sure this si the case, though.
More information about the Python-list