[I18n-sig] Mixed encodings and XML

Tom Emerson tree@basistech.com
Wed, 13 Dec 2000 20:22:19 -0500


uche.ogbuji@fourthought.com writes:
> Good question.  I have not tried Chen Chien-Hsun's original HTML.
> Perhaps even that won't work in a browser.  Makes sense.  What does
> a browser do with a document with
> 
> <META HTTP-EQUIV='Content-Type' CONTENT='text/html; charset=iso-8859-1'>
>                                                             ^^^^^^^^^^
>                                                             !!!!???!!!!
> 
> In the header and then runs into a big patch of UCS-2 or BIG5?

It treats those bytes as 8-bit Latin 1 characters and it displays
them. Once you've seen enough of these you start recognizing the
patterns, but it is still junk.

> My guess is that it displays gibberish as you suggest.  In this case, I think 
> there's no point expecting HTML generated from XML to do any better and it 
> simply makes sense to break out the alternatively encoded portions into 
> separate, linked files.

No. What makes sense, if the intention of the original author is to
show the Chinese text correctly, is to convert that section to UTF-8
and put that in the document.

    -tree

-- 
Tom Emerson                                          Basis Technology Corp.
Zenkaku Language Hacker                            http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"