PEP 3131: Supporting Non-ASCII Identifiers

Thu May 17 11:13:38 EDT 2007

>     I'd suggest restricting identifiers under the rules of UTS-39,
> profile 2, "Highly Restrictive".  This limits mixing of scripts
> in a single identifier; you can't mix Hebrew and ASCII, for example,
> which prevents problems with mixing right to left and left to right
> scripts.  Domain names have similar restrictions.

That sounds interesting, however, I cannot find the document
your refer to. In TR 39 (also called Unicode Technical Standard #39),
at http://unicode.org/reports/tr39/ there is no mentioning
of numbered profiles, or "Highly Restrictive".

Looking at the document, it seems 3.1., "General Security Profile
for Identifiers" might apply. IIUC, xidmodifications.txt would
have to be taken into account.

I'm not quite sure what that means; apparently, a number of
characters (listed as restricted) should not be used in
identifiers. OTOH, it also adds HYPHEN-MINUS and KATAKANA
MIDDLE DOT - which surely shouldn't apply to Python
identifiers, no? (at least HYPHEN-MINUS already has a meaning
in Python, and cannot possibly be part of an identifier).

Also, mixed-script detection might be considered, but it is
not clear to me how to interpret the algorithm in section
5, plus it says that this is just one of the possible
algorithms.

Finally, Confusable Detection is difficult to perform on
a single identifier - it seems you need two of them to
find out whether they are confusable.

In any case, I added this as an open issue to the PEP.

Regards,
Martin