[I18n-sig] XML and UTF-16

Tom Emerson tree@basistech.com
Thu, 31 May 2001 17:35:30 -0400


Paul Prescod writes:
> I think so. UTF-32 is a 32-bit encoding and 32 bits are 4 bytes. You
> only need one character (either a BOM or a "<") sign to know what you
> are dealing with.

Well, you know that the first UTF-32 character is "<", but no
more. I'd at least look for "<?xml" to be absolutely sure, but I'm
also overly paranoid. You could be looking at "<!DOCTYPE" or some
such.

-- 
Tom Emerson                                          Basis Technology Corp.
Sr. Sinostringologist                              http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"