[XML-SIG] Mixed encodings and XML

Uche Ogbuji uche.ogbuji@fourthought.com
Wed, 13 Dec 2000 15:59:31 -0700


[crossposted: 4Suite, xml-sig, i18n-sig]

Time for me to expose my ignorance on XML and i18n again.

How would one go about creating a well-formed XML document with multiple
encodings?  For instance, if I had UCS-2, UTF-8 and BIG5 all in one doc,
how could I make it work.  Take the following example

ftp://ftp.fourthought.com/pub/etc/HOWTO/cjkv.doc

This document is a CJKV HOWTO by Chen Chien-Hsun.  He originally wrote
it in HTML.  See

ftp://ftp.fourthought.com/pub/etc/HOWTO/CJKV_4XSLT.HTM

It contains many sections within HTML PREs with the different encodings
I mentioned.  They look like

<PRE LANG="zh-TW">
... BIG5-encoded stuff ...
</PRE>

I need to convert the document to XML Docbook format.  My naive attempts
at converting to 

<screen xml:lang="zh-TW">
... BIG5-encoded stuff ...
</screen>

Of course don't work because the parser takes one look at the BIG5 and
throws a well-formedness error.

Is there any way to manage this besides using XInclude?  Do any of the
Python parsers have any tricks that could help?

Thanks.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python