[I18n-sig] Mixed encodings and XML
Tom Emerson
tree@basistech.com
Wed, 13 Dec 2000 20:22:19 -0500
uche.ogbuji@fourthought.com writes:
> Good question. I have not tried Chen Chien-Hsun's original HTML.
> Perhaps even that won't work in a browser. Makes sense. What does
> a browser do with a document with
>
> <META HTTP-EQUIV='Content-Type' CONTENT='text/html; charset=iso-8859-1'>
> ^^^^^^^^^^
> !!!!???!!!!
>
> In the header and then runs into a big patch of UCS-2 or BIG5?
It treats those bytes as 8-bit Latin 1 characters and it displays
them. Once you've seen enough of these you start recognizing the
patterns, but it is still junk.
> My guess is that it displays gibberish as you suggest. In this case, I think
> there's no point expecting HTML generated from XML to do any better and it
> simply makes sense to break out the alternatively encoded portions into
> separate, linked files.
No. What makes sense, if the intention of the original author is to
show the Chinese text correctly, is to convert that section to UTF-8
and put that in the document.
-tree
--
Tom Emerson Basis Technology Corp.
Zenkaku Language Hacker http://www.basistech.com
"Beware the lollipop of mediocrity: lick it once and you suck forever"