Does python's minidom support Chinese?
Uche Ogbuji
uche at ogbuji.net
Sun Mar 14 23:09:45 EST 2004
Anthony Liu <antonyliu2002 at yahoo.com> wrote in message news:<mailman.250.1078955721.19534.python-list at python.org>...
> The following 4 lines of code parses an XML document
> very well if the XML document contains only English
> words.
>
> But when I insert one Chinese character into the XML
> document, then Python starts to complain when it hits
> the Chinese character, saying that it is an invalid
> token and thus it is not well-formed.
>
> This is the complaint of Python:
>
> ExpatError: not well-formed (invalid token): line 3,
> column 7
>
> line 3 and column 7 exactly pinpoints the 1st Chinese
> character in the XML document.
This is an XML problem on your end, not a minidom problem. That error
probably means that you are either omitting the XML declaration (and
thus defaulting to UTF-8 or UTF-16) or declaring a bogus encoding.
> The problem remains even if I try encoding="UTF-16" or
> encoding="GB2312" or encoding="GBK" in the xml
> document.
Well, you can't just go shopping about for oare it accordingly.
Back to minidom: even after you fix your XML problems you may still
have trouble with minidom because the expat reader has to understand
the encoding you're using. I think that it may use the Python codecs
model to find the encoding you declared, so you may just need to
install a Python Chinese codecs package, and you'll be all set. I'm
not entirely sure this si the case, though.
--Uche
http://uche.ogbuji.net
More information about the Python-list
mailing list