
I'd like to work on adding support for non-ASCII characters in identifiers, using the following principles:
1. At run-time, identifiers are represented as Unicode objects unless they are pure ASCII. IOW, they are converted from the source encoding to Unicode objects in the process of parsing.
2. As a consequence of 1), all places there identifiers appear need to support Unicode objects (e.g. __dict__, __getattr__, etc)
3. Legal non-ASCII identifiers are what legal non-ASCII identifiers are in Java, except that Python may use a different version of the Unicode character database. Python would share the property that future versions allow more characters in identifiers than older versions.
If you are too lazy too look up the Java definition, here is a rough overview: An identifier is "JavaLetter JavaLetterOrDigit*"
JavaLetter is a character of the classes Lu, Ll, Lt, Lm, or Lo, or a currency symbol (for Python: excluding $), or a connecting punctuation character (which is unfortunately underspecified - will research the implementation).
JavaLetterOrDigit is a JavaLetter, or a digit, a numeric letter, a combining mark, a non-spacing mark, or an ignorable control character.
Does this need a PEP?
Sure does. Since this could create a serious burden for code protability, I'd like to see a serious section on motivation and discussion on how to keep Unicode out of the standard library and out of most 3rd party distributions. Without that I'm strongly -1. --Guido van Rossum (home page: http://www.python.org/~guido/)