[I18n-sig] XML and UTF-16
Tom Emerson
tree@basistech.com
Thu, 31 May 2001 13:23:31 -0400
M.-A. Lemburg writes:
> What is the standard file layout to use for storing an XML file
> in UTF-16 ?
I thought this was covered in the XML specification as a non-normative
appendix. Maybe not.
> 1) encode the whole file in UTF-16 (possibly prepended with a BOM)
Yes. You can then pretty easily autodetect the which Unicode
transformation format is being used by looking at the first ten or
so bytes.
If the BOM is present, that's a big clue right there.
UTF-16-BE will have the first "<?xml " encoded like
003C 003F 0078 006D 006E
while UTF-16-LE will have it encoded as
3C00 3F00 7800 6D00 6E00
ASCII and UTF-8 will just have
3C 3F 78 6D 6E
> 2) write the first line containing the XML header (which has the
> encoding information) in ASCII and then proceed with UTF-16
> starting after the newline character
Ugh, no.
-tree
--
Tom Emerson Basis Technology Corp.
Sr. Sinostringologist http://www.basistech.com
"Beware the lollipop of mediocrity: lick it once and you suck forever"