[Python-3000] PEP: Supporting Non-ASCII Identifiers

Mon Jun 4 13:11:50 CEST 2007

On Sun, 3 Jun 2007, [UTF-8] "Martin v. LÃ¶wis" wrote:
> >> All identifiers are converted into the normal form NFC while parsing;
> >
> > Actually, shouldn't the whole file be converted to NFC, instead of
> > only identifiers? If you have decomposable characters in strings and
> > your editor decides to normalize them to a different form than in the
> > original source, the meaning of the code will change when you save
> > without you noticing anything.
>
> Sure - but how can Python tell whether a non-normalized string was
> intentionally put into the source, or as a side effect of the editor
> modifying it?

It seems to me the simplest thing to do is to require that Python
source files be normalized.  Then the ambiguity just goes away.
Everyone knows what form their files should be in, and if you really
need to construct a non-normalized string, you can do that explicitly
using "\u" notation.

-- ?ng