[Python-Dev] PEP 263 considered faulty (for some Japanese)

Tue, 12 Mar 2002 18:55:19 +0100

Guido van Rossum wrote:
> 
> > My position on this is *not* to introduce more defaults -- explicit
> > is better than implicit and in this particular case (encodings)
> > it'll result in a net win.
> 
> I'd like to believe you.  But the fact that apparently there are
> Japanese users who are willing to give up part of the language and
> library just so they can have an certain default, suggests that
> the need for defaults is a strong force.  Maybe we would've been
> better off leaving sys.setdefaultencoding() enabled -- then those
> people might have put sys.setdefaultencoding("utf-16") at the top of
> their program rather than hacking site.py... :-(

(Actually, they'll tweak sitecustomize.py.) It's good that they
can only apply the change in this one location. Placing it inside
the various modules would cause a maintenance nightmare, removing
one of the great advantages of Python over other languages.

Anyway, I tend to believe that changes to the default encoding
are only rarely needed and then only to overcome problems with
occasional use of Unicode.

It's a myth that you can port a program to Unicode by tweaking
the default encoding to fit your environment alone. 

A true port will have to follow the Unicode object through the 
complete processing chain and apply the needed changes along the 
chain (which is much work, but certainly possible). A different
strategy would be treating text data as binary data and not
using Unicode at all. It all depends on the application scope.

> > > In the light of the post by Atsuo Ishimoto and the responses from both
> > > Marc-Andre Lemburg and Martin von Loewis, however, I'm not sure
> > > whether Suziki Hisao's response represents the Japanese community, and
> > > it's possible that nothing needs to be done.
> >
> > Well, users using non-ASCII coding in their source files
> > should start to be explicit about the encoding (in phase 1
> > they'll get a warning printed which makes them aware of the
> > problem), but other than that, I don't see a need for
> > changes to the strategy.
> 
> Suzuki won't get the warning, because his source files are pure ASCII
> -- but his Unicode string literals will be interpreted as utf-16,
> which will break his programs.  The question is, do we care about him
> and others like him, or do we decide that their habits are bad for
> them and they have to change them?

We do care (after all, the PEP was designed for non-ASCII
users), but it was never intended that we allow encodings
like UTF-16 to be used for Python source code. 

I'm afraid there's nothing much we can do. For UTF-8 they would 
just have to add a single coding comment to all source files, 
but there's nothing we can offer them for UTF-16.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                   http://www.egenix.com/files/python/