Is there really a default source encoding?

Brian Quinlan brian at sweetapp.com
Fri Jan 24 11:49:16 EST 2003


> > No, UTF-32 exists. For Japanese, UTF-8 requires (at minimum) 50%
more
> > space per character than UTF-8. I was being facetious with my UTF-32
> > comment. But UTF-32 may become more efficient than UTF-16, for some
> > languages (e.g. Sancrit), in the future.
> 
> Hardly so. UTF-16 requires four bytes per character in the worst case;
> UTF-32 requires four bytes per character for every character.

What if, in the future, there are close to 2^32 Unicode characters.
UTF-32 might require only 4 bytes to store a character while UTF-16
would require 6. Or is that impossible?

> I usually refer to the UTF-8 BOM as "UTF-8 signature", as it does not
> indicate a byte order, but indicates the encoding itself.

That's exactly what I said. I like the UTF-8 BOM because it allows
instant encoding detection. 

Cheers,
Brian






More information about the Python-list mailing list