[Python-Dev] Allowing non-ASCII identifiers

Wed Jan 14 15:08:58 EST 2004

I'd like to work on adding support for non-ASCII characters
in identifiers, using the following principles:

1. At run-time, identifiers are represented as Unicode
    objects unless they are pure ASCII. IOW, they are
    converted from the source encoding to Unicode objects
    in the process of parsing.

2. As a consequence of 1), all places there identifiers
    appear need to support Unicode objects (e.g. __dict__,
    __getattr__, etc)

3. Legal non-ASCII identifiers are what legal non-ASCII
    identifiers are in Java, except that Python may use
    a different version of the Unicode character database.
    Python would share the property that future versions
    allow more characters in identifiers than older versions.

    If you are too lazy too look up the Java definition,
    here is a rough overview:
    An identifier is "JavaLetter JavaLetterOrDigit*"

    JavaLetter is a character of the classes Lu, Ll,
    Lt, Lm, or Lo, or a currency symbol (for Python:
    excluding $), or a connecting punctuation character
    (which is unfortunately underspecified - will
     research the implementation).

    JavaLetterOrDigit is a JavaLetter, or a digit,
    a numeric letter, a combining mark, a non-spacing
    mark, or an ignorable control character.

Does this need a PEP?

Regards,
Martin