Multibyte Character Surport for Python

Tue May 14 03:52:15 EDT 2002

martin at v.loewis.de (Martin v. Loewis) writes:
> "Stephen J. Turnbull" <stephen at xemacs.org> writes:
> > That doesn't mean we think that Python should prohibit writing
> > programs in arbitrary user-specified encodings.  Only that the
> > facility for transforming a non-Unicode program into Unicode should be
> > provided as a standard library facility, rather than part of the
> > language.
> 
> I believe that you are still the only one who voices this specific
> position. More often, you find the position that Python source code
> should be restricted to UTF-8, period. . . .
> Apart from you, nobody else agrees with the approach "let's make it
> part of the library instead of part of the language". To most users,
> the difference appears not to matter (including myself, except that I
> think making it part of the language simplifies maintenance of the
> feature).

I don't fully understand all the issues here, but I don't think that
pointing out that Stephen is the only person who holds a particular
opinion necessarily suggests that he is wrong.  I believe Stephen is
the only person here who regularly writes in a language that is
written in a non-Latin character set --- Japanese, in his case.  Also,
although I am not certain of this, I think he has worked on the
internationalization support in XEmacs.

> I don't consider it evil to provide users with options: If UTF-8 is
> technically superior (which I agree it is), it will become the default
> text encoding of the future, anywith, with or without this PEP. Notice
> that the PEP slightly favours UTF-8 over other encodings, due to
> support of the UTF-8 signature.

About providing users with options --- is it possible that these
options could mean I couldn't recompile your Python code if I don't
have code to support the particular encoding you wrote it in?  How
about cutting and pasting code between modules written in different
encodings, either in an editor that didn't support Unicode or didn't
support one of the encodings correctly?

About using "recode" to support existing e.g. ISO-8859-15 code.  If I
am not mistaken, that code can presently only contain ISO-8859-15
inside of byte strings and Unicode strings.  Python 2.1 seems to
assume ISO-8859-1 for Unicode string contents.  Would it be sufficient
to recode the contents of Unicode strings?