[Python-Dev] Allowing non-ASCII identifiers
"Martin v. Löwis"
martin at v.loewis.de
Wed Jan 14 15:08:58 EST 2004
I'd like to work on adding support for non-ASCII characters
in identifiers, using the following principles:
1. At run-time, identifiers are represented as Unicode
objects unless they are pure ASCII. IOW, they are
converted from the source encoding to Unicode objects
in the process of parsing.
2. As a consequence of 1), all places there identifiers
appear need to support Unicode objects (e.g. __dict__,
__getattr__, etc)
3. Legal non-ASCII identifiers are what legal non-ASCII
identifiers are in Java, except that Python may use
a different version of the Unicode character database.
Python would share the property that future versions
allow more characters in identifiers than older versions.
If you are too lazy too look up the Java definition,
here is a rough overview:
An identifier is "JavaLetter JavaLetterOrDigit*"
JavaLetter is a character of the classes Lu, Ll,
Lt, Lm, or Lo, or a currency symbol (for Python:
excluding $), or a connecting punctuation character
(which is unfortunately underspecified - will
research the implementation).
JavaLetterOrDigit is a JavaLetter, or a digit,
a numeric letter, a combining mark, a non-spacing
mark, or an ignorable control character.
Does this need a PEP?
Regards,
Martin
More information about the Python-Dev
mailing list