[XML-SIG] how to get xml parse with non ascii charset

Uche Ogbuji uche.ogbuji at fourthought.com
Sat Jul 26 09:18:17 EDT 2003

> Dieter Maurer <dieter at handshake.de> writes:
> > XML processing systems (such as the parser "Expat") are only required
> > to support UTF-8 (and maybe UTF-16) encodings. All other
> > encodings are optional.
> > 
> > Apparently, "Expat" does not support "gb2312".
> Correct. pyexpat only supports single-byte encodings, and UTF-8. If
> you want to use other encodings with PyXML, you will have to use
> xmlproc.

FWIW, 4Suite's Domlettes extend Expat's built-in decoders by using the Python 
codec for unknown encodings.  This means that as long as you have a Python 
codec named "gb2312" installed, that your document should parse OK.

If you want to give it a try, see


for intros to Domlette.

Uche Ogbuji                                    Fourthought, Inc.
http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
XML Data Bindings in Python, Part 2 - http://www.xml.com/pub/a/2003/07/02/py-xm
Introducing Examplotron - http://www-106.ibm.com/developerworks/xml/library/x-x
Charming Jython - http://www-106.ibm.com/developerworks/java/library/j-jython.h
Python, Web services, and XSLT - http://www-106.ibm.com/developerworks/xml/libr
A custom-fit career in app development - http://www.adtmag.com/article.asp?id=7

More information about the XML-SIG mailing list