[XML-SIG] how to get xml parse with non ascii charset
Uche Ogbuji
uche.ogbuji at fourthought.com
Sat Jul 26 09:18:17 EDT 2003
> Dieter Maurer <dieter at handshake.de> writes:
>
> > XML processing systems (such as the parser "Expat") are only required
> > to support UTF-8 (and maybe UTF-16) encodings. All other
> > encodings are optional.
> >
> > Apparently, "Expat" does not support "gb2312".
>
> Correct. pyexpat only supports single-byte encodings, and UTF-8. If
> you want to use other encodings with PyXML, you will have to use
> xmlproc.
FWIW, 4Suite's Domlettes extend Expat's built-in decoders by using the Python
codec for unknown encodings. This means that as long as you have a Python
codec named "gb2312" installed, that your document should parse OK.
If you want to give it a try, see
http://www.xml.com/pub/a/2002/10/16/py-xml.html
http://uche.ogbuji.net/tech/akara/nodes/2003-01-01/domlettes
for intros to Domlette.
--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
XML Data Bindings in Python, Part 2 - http://www.xml.com/pub/a/2003/07/02/py-xm
l.html
Introducing Examplotron - http://www-106.ibm.com/developerworks/xml/library/x-x
mptron/
Charming Jython - http://www-106.ibm.com/developerworks/java/library/j-jython.h
tml
Python, Web services, and XSLT - http://www-106.ibm.com/developerworks/xml/libr
ary/ws-pyth13/
A custom-fit career in app development - http://www.adtmag.com/article.asp?id=7
744
More information about the XML-SIG
mailing list