[I18n-sig] Strawman Proposal (2): Encoding attributes

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Sat, 10 Feb 2001 07:55:12 +0100

> We have always said that only ASCII-supersets should be legal source
> file encodings.

That may be a bit too restrictive. I understand that people use all of
EUC-JP, Shift-JIS, and ISO-2022-JP to encode Japanese text. I'm not
certain whether iso-2022 is used in source code, but the first two
certainly are (euc-jp on Unix, shift-jis on Windows).

My understanding is that only EUC-JP is an ASCII superset (*)
(i.e. all bytes representing JIS characters are >127); in Shift-JIS,
the encoding of a character is two bytes, of which only the first byte
is always >128. Since Shift-JIS is quite common, it should be
supported as a file encoding.


(*) ignoring the question whether \x24 is the DOLLAR SIGN or the YEN