[I18n-sig] Mixed encodings and XML
Uche Ogbuji
uche.ogbuji@fourthought.com
Wed, 13 Dec 2000 15:59:31 -0700
[crossposted: 4Suite, xml-sig, i18n-sig]
Time for me to expose my ignorance on XML and i18n again.
How would one go about creating a well-formed XML document with multiple
encodings? For instance, if I had UCS-2, UTF-8 and BIG5 all in one doc,
how could I make it work. Take the following example
ftp://ftp.fourthought.com/pub/etc/HOWTO/cjkv.doc
This document is a CJKV HOWTO by Chen Chien-Hsun. He originally wrote
it in HTML. See
ftp://ftp.fourthought.com/pub/etc/HOWTO/CJKV_4XSLT.HTM
It contains many sections within HTML PREs with the different encodings
I mentioned. They look like
<PRE LANG="zh-TW">
... BIG5-encoded stuff ...
</PRE>
I need to convert the document to XML Docbook format. My naive attempts
at converting to
<screen xml:lang="zh-TW">
... BIG5-encoded stuff ...
</screen>
Of course don't work because the parser takes one look at the BIG5 and
throws a well-formedness error.
Is there any way to manage this besides using XInclude? Do any of the
Python parsers have any tricks that could help?
Thanks.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python