[Python-Dev] PEP 263 considered faulty (for some Japanese)

Martin v. Loewis martin@v.loewis.de
13 Mar 2002 08:54:18 +0100


Tom Emerson <tree@basistech.com> writes:

> The UTF-8 BOM is an aBOMination that should not be allowed to
> live. The only editor that I know of that inserts the sequence is
> Microsoft's WordPad (or TextPad, I don't use either). I hope XEmacs
> isn't going to do this.

I used to think the same way, but now I have changed sides. I still
agree that the notion of UCS byte orders is an abomination, and even
that using UCS in on-disk files is a stupid thing to do.

Reliable detection of encodings is a good thing, though, as the Web
has demonstrated. Encoding declarations are good (this is the idea
behind PEP 263). Just consider the UTF-8 BOM not as a byte-order mark
(what byte order, anyway), but as an encoding declaration, or
signature. With that view, I can happily accept it as useful, and I
wish more editors would atleast comprehend it (in the sense of
displaying it with zero width), and perhaps even generate it.

Regards,
Martin