Multibyte Character Surport for Python

Tue May 14 04:13:24 EDT 2002

Kragen Sitaker <kragen at pobox.com> writes:

> I don't fully understand all the issues here, but I don't think that
> pointing out that Stephen is the only person who holds a particular
> opinion necessarily suggests that he is wrong.  

I'm not suggesting that he is 'wrong'; this specific question (how to
deal with source code encodings in programming languages) is not one
that has a single object 'right' answer.

Instead, it is a matter of judgement, based on criteria, which might
be both technical and political. I'm just suggesting that few people
seem to have the same criteria, or, atleast when applying them to the
specific question, come to the same conclusion.

> I believe Stephen is the only person here who regularly writes in a
> language that is written in a non-Latin character set --- Japanese,
> in his case.  Also, although I am not certain of this, I think he
> has worked on the internationalization support in XEmacs.

Yes, I appreciate all that.

> About providing users with options --- is it possible that these
> options could mean I couldn't recompile your Python code if I don't
> have code to support the particular encoding you wrote it in?  

Yes, that is the case.

> How about cutting and pasting code between modules written in
> different encodings, either in an editor that didn't support Unicode
> or didn't support one of the encodings correctly?

That is completely a matter of your editor. If the editor doesn't
support one of your encodings, it cannot display the source code
correctly.

If so, there is a good chance that it couldn't display the source code
correctly even if it had a different encoding.

For IDLE, if the source is displayed correctly, you will certainly be
able to copy arbitrary text. You may not be able to save the file in
the specified encoding then, anymore, if you paste text that cannot be
represented in that encoding.

> About using "recode" to support existing e.g. ISO-8859-15 code.  If I
> am not mistaken, that code can presently only contain ISO-8859-15
> inside of byte strings and Unicode strings.  Python 2.1 seems to
> assume ISO-8859-1 for Unicode string contents.  Would it be sufficient
> to recode the contents of Unicode strings?

I don't think I understand the question. Are you talking about the GNU
recode utility?

Python code can contain non-ASCII in byte strings literals, Unicode
string literals, and comments. For recoding, all of those places need
to be recoded, or else no editor in the world will be able to display
the file correctly.

Regards,
Martin