[XML-SIG] how to get xml parse with non ascii charset
uche.ogbuji at fourthought.com
Sat Jul 26 09:18:17 EDT 2003
> Dieter Maurer <dieter at handshake.de> writes:
> > XML processing systems (such as the parser "Expat") are only required
> > to support UTF-8 (and maybe UTF-16) encodings. All other
> > encodings are optional.
> > Apparently, "Expat" does not support "gb2312".
> Correct. pyexpat only supports single-byte encodings, and UTF-8. If
> you want to use other encodings with PyXML, you will have to use
FWIW, 4Suite's Domlettes extend Expat's built-in decoders by using the Python
codec for unknown encodings. This means that as long as you have a Python
codec named "gb2312" installed, that your document should parse OK.
If you want to give it a try, see
for intros to Domlette.
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
XML Data Bindings in Python, Part 2 - http://www.xml.com/pub/a/2003/07/02/py-xm
Introducing Examplotron - http://www-106.ibm.com/developerworks/xml/library/x-x
Charming Jython - http://www-106.ibm.com/developerworks/java/library/j-jython.h
Python, Web services, and XSLT - http://www-106.ibm.com/developerworks/xml/libr
A custom-fit career in app development - http://www.adtmag.com/article.asp?id=7
More information about the XML-SIG