[Python-3000] PEP: Supporting Non-ASCII Identifiers

Sun Jun 3 19:11:21 CEST 2007

>> All identifiers are converted into the normal form NFC while parsing;
> 
> Actually, shouldn't the whole file be converted to NFC, instead of
> only identifiers? If you have decomposable characters in strings and
> your editor decides to normalize them to a different form than in the
> original source, the meaning of the code will change when you save
> without you noticing anything.

Sure - but how can Python tell whether a non-normalized string was
intentionally put into the source, or as a side effect of the editor
modifying it?

In most cases, it won't matter. If it does, it should be explicit in
the code, e.g. by putting an n() function around the string literal.

> It's always better to be explicit when you want to make invisible
> distinctions. In the rare cases anything but NFC is really needed you
> can do explicit conversion or use escapes. Having to add normalization
> calls around all unicode strings to code defensively is neither
> convenient nor obvious.

However, it typically isn't necessary, either.

Also, there is still room for subtle issues, e.g. when concatenating
two normalized strings will produce a string that isn't normalized.
Also, in many cases, strings come from IO, not from source, so if
it is important that they are in NFC, you need to normalize anyway.

Regards,
Martin