[I18n-sig] XML and UTF-16

Paul Prescod paulp@ActiveState.com
Thu, 31 May 2001 14:44:37 -0700


Tom Emerson wrote:
> 
> Paul Prescod writes:
> > I think so. UTF-32 is a 32-bit encoding and 32 bits are 4 bytes. You
> > only need one character (either a BOM or a "<") sign to know what you
> > are dealing with.
> 
> Well, you know that the first UTF-32 character is "<", but no
> more. I'd at least look for "<?xml" to be absolutely sure, but I'm
> also overly paranoid. You could be looking at "<!DOCTYPE" or some
> such.

Would it matter if you were looking at <!DOCTYPE? Anyhow, a UTF-32
document without an XML declaration would be in error. The declaration
is required for everything other than UTF-8 and UTF-16.

-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook