Re: [Python-Dev] Allowing non-ASCII identifiers

Jan. 14, 2004


      ...
I'd like to work on adding support for non-ASCII characters
in identifiers, using the following principles:
1. At run-time, identifiers are represented as Unicode
    objects unless they are pure ASCII. IOW, they are
    converted from the source encoding to Unicode objects
    in the process of parsing.
2. As a consequence of 1), all places there identifiers
    appear need to support Unicode objects (e.g. __dict__,
    __getattr__, etc)
3. Legal non-ASCII identifiers are what legal non-ASCII
    identifiers are in Java, except that Python may use
    a different version of the Unicode character database.
    Python would share the property that future versions
    allow more characters in identifiers than older versions.
If you are too lazy too look up the Java definition,
    here is a rough overview:
    An identifier is "JavaLetter JavaLetterOrDigit*"
JavaLetter is a character of the classes Lu, Ll,
    Lt, Lm, or Lo, or a currency symbol (for Python:
    excluding $), or a connecting punctuation character
    (which is unfortunately underspecified - will
     research the implementation).
JavaLetterOrDigit is a JavaLetter, or a digit,
    a numeric letter, a combining mark, a non-spacing
    mark, or an ignorable control character.
Does this need a PEP?
Sure does.  Since this could create a serious burden for code
protability, I'd like to see a serious section on motivation and
discussion on how to keep Unicode out of the standard library and out
of most 3rd party distributions.  Without that I'm strongly -1.

--Guido van Rossum (home page: http://www.python.org/~guido/)