[Python-Dev] PEP 263 considered faulty (for some Japanese)

M.-A. Lemburg mal@lemburg.com
Tue, 12 Mar 2002 14:44:27 +0100


Guido van Rossum wrote:
> 
> >     Guido> I think I can propose a compromise though: there may be two
> >     Guido> default encodings, one used for Python source code, and one
> >     Guido> for data.
> 
> [Stephen J. Turnbull]
> > Why go in this direction?  It's better to allow each individual stream
> > to specify a codec to be implicitly applied, I think.  Consider Emacs,
> > for example, which allows specification of default codecs for (1) file
> > contents (2) names of file system objects (3) process I/O (but not I
> > and O and E separately, which has caused problems!) (4) console input
> > and (5) console output.  All of those are plausible candidates for
> > having separate defaults in Python as well.
> >
> > For example, in Japan it's easy to imagine a program with local file
> > contents defaulting to UTF-8 (for cross-system portability) needing to
> > access the Windows 9x console and file system in Shift JIS, while
> > process (eg, network) I/O might be EUC-JP if the server were Unix.
> > (Yes, I'm straining, but not much.)
> >
> > But if you allow codecs for each stream, people who want to have
> > different defaults for certain classes of stream would just derive
> > classes which initialized the default codec appropriately.
> 
> Attaching codecs to streams is currently pretty painful AFAICT (I've
> never tried it :-), but I think your idea has merit: there are
> sufficiently many different contexts where an encoding must be
> specified that it makes sense to allow setting different defaults for
> the different contexts.  The issue of filename encoding is one with
> which we (well, some of us) have struggled recently.
> 
> We'd have to
> think more about which contexts exactly to consider; for now I can
> come up with:
> 
> - file I/O;
> 
> - OS filenames;
> 
> - implicit mixing of 8-bit and Unicode strings;
> 
> - invocation of unicode(s) or u.decode() without an encoding.
> 
> I see your proposal as a possible future generalization of my
> two-encodings proposal, not as an incimpatible alternative.

My position on this is *not* to introduce more defaults -- explicit
is better than implicit and in this particular case (encodings) 
it'll result in a net win.

> In the light of the post by Atsuo Ishimoto and the responses from both
> Marc-Andre Lemburg and Martin von Loewis, however, I'm not sure
> whether Suziki Hisao's response represents the Japanese community, and
> it's possible that nothing needs to be done.

Well, users using non-ASCII coding in their source files
should start to be explicit about the encoding (in phase 1
they'll get a warning printed which makes them aware of the 
problem), but other than that, I don't see a need for 
changes to the strategy.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                   http://www.egenix.com/files/python/