[Python-Dev] Allowing non-ASCII identifiers

Guido van Rossum guido at python.org
Wed Jan 14 15:16:56 EST 2004


> I'd like to work on adding support for non-ASCII characters
> in identifiers, using the following principles:
> 
> 1. At run-time, identifiers are represented as Unicode
>     objects unless they are pure ASCII. IOW, they are
>     converted from the source encoding to Unicode objects
>     in the process of parsing.
> 
> 2. As a consequence of 1), all places there identifiers
>     appear need to support Unicode objects (e.g. __dict__,
>     __getattr__, etc)
> 
> 3. Legal non-ASCII identifiers are what legal non-ASCII
>     identifiers are in Java, except that Python may use
>     a different version of the Unicode character database.
>     Python would share the property that future versions
>     allow more characters in identifiers than older versions.
> 
>     If you are too lazy too look up the Java definition,
>     here is a rough overview:
>     An identifier is "JavaLetter JavaLetterOrDigit*"
> 
>     JavaLetter is a character of the classes Lu, Ll,
>     Lt, Lm, or Lo, or a currency symbol (for Python:
>     excluding $), or a connecting punctuation character
>     (which is unfortunately underspecified - will
>      research the implementation).
> 
>     JavaLetterOrDigit is a JavaLetter, or a digit,
>     a numeric letter, a combining mark, a non-spacing
>     mark, or an ignorable control character.
> 
> Does this need a PEP?

Sure does.  Since this could create a serious burden for code
protability, I'd like to see a serious section on motivation and
discussion on how to keep Unicode out of the standard library and out
of most 3rd party distributions.  Without that I'm strongly -1.

--Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list