[Python-Dev] PEP 263 considered faulty (for some Japanese)

Tue, 12 Mar 2002 12:36:23 +0900

   I am a Japanese fan/developer/user of Python for years.  I
have recently read the PEP 263 --- Defining Python Source Code
Encodings.  I have been discussing about it on the Japanese
mailing list of Python last week, and I and others found a
severe fault in it.
   I have also read the Parade of the PEPs and know that it is
very close to being checked in, so I am writing this message to
you in English in a hurry.  The PEP 263, as is, will damage the
usability of Python in Japan.

   The PEP says, "Just as in coercion of strings to Unicode,
Python will default to the interpreter's default encoding (which
is ASCII in standard Python installations) as standard encoding
if no other encoding hints are given."  This will let many
English people free from writing the magic comment to their
scripts explicitly.  However, many Japanese set the default
encoding other than ASCII (we use multi-byte encodings for daily
use, not as luxury), and some Japanese set it, say, "utf-16".

   By the PEP as is, persons who use "utf-16" etc. will not be
able to use many Python scripts any more.  Certainly you can
tell them not to use "utf-16" as the default encoding.  But some
of them have been writing their scripts in ASCII just as
specified in the Language Reference, just omitting the encoding
specification from their scripts to handle their Unicode
documents easily.  Thus it would be safe to say that it is
simply unfair.

   I would propose that Python should default to ASCII as
standard encoding if no other encoding hints are given, as the
bottom line.  The interpreter's default encoding should not be
referred for source code.
   And I hope that Python defaults to UTF-8 as standard encoding
if no other encoding hints are given.  It is ASCII-compatible
perfectly and language-neutral.  If you once commit yourself to
Unicode, I think, UTF-8 is an obvious choice anyway.

   From my experiences, inserting the '-*- coding: <coding name>
-*-' line into an existing file and converting such a file into
UTF-8 are almost the same amount of work.  We will be glad if
Python understands Japanese (and other) characters by default
(by adopting, say, UTF-8 as default).

--
SUZUKI Hisao <suzuki@acm.org> <suzuki611@okisoft.co.jp>