[I18n-sig] Pre-PEP: Proposed Python Character Model

Paul Prescod paulp@ActiveState.com
Wed, 07 Feb 2001 11:51:51 -0800


Hooper Brian wrote:
> 
> ...
> 
> As someone who is frequently using Python with Japanese
> from day to day, I'd just like to offer that I think that
> most Japanese users are not philosophically opposed to
> Unicode, they would just like support for Unicode to have
> as little an impact as possible on older
> pre-Unicode-support code.  One fairly extended discussion
> on this list concerned how to allow for a different
> encoding default than UTF-8, since a lot of programs here
> are written to handle EUC and SJIS directly as byte-string
> literals.

In my opinion there should be *no* encoding default. New code should
always specify an encoding. Old code should continue to work the same.

> ... What about adding an
> optional encoding argument to the existing open(),
> allowing encoding to be passed to that, and using 'raw' as
> the default format (what it does now)?

I'm not content to have a "default" in the long term. Users should just
choose their encodings. Why would your Japanese user prefer to work with
the raw bytes of their Shift-JIS instead of having it decoded into
Unicode characters? Requiring Asians hacking bytes instead of characters
is what we are trying to avoid! Shift-JIS and Unicode are not at odds.
Shift-JIS is a great *encoding* for Unicode (the abstract character
set). Shift-JIS is what should be on the disk. Unicode is what you
should be working with in memory. Of course there will always be some
corner cases where this is not the case but that should be the general
model...

 Paul Prescod