[Python-Dev] Allowing non-ASCII identifiers
Guido van Rossum
guido at python.org
Wed Jan 14 15:16:56 EST 2004
> I'd like to work on adding support for non-ASCII characters
> in identifiers, using the following principles:
>
> 1. At run-time, identifiers are represented as Unicode
> objects unless they are pure ASCII. IOW, they are
> converted from the source encoding to Unicode objects
> in the process of parsing.
>
> 2. As a consequence of 1), all places there identifiers
> appear need to support Unicode objects (e.g. __dict__,
> __getattr__, etc)
>
> 3. Legal non-ASCII identifiers are what legal non-ASCII
> identifiers are in Java, except that Python may use
> a different version of the Unicode character database.
> Python would share the property that future versions
> allow more characters in identifiers than older versions.
>
> If you are too lazy too look up the Java definition,
> here is a rough overview:
> An identifier is "JavaLetter JavaLetterOrDigit*"
>
> JavaLetter is a character of the classes Lu, Ll,
> Lt, Lm, or Lo, or a currency symbol (for Python:
> excluding $), or a connecting punctuation character
> (which is unfortunately underspecified - will
> research the implementation).
>
> JavaLetterOrDigit is a JavaLetter, or a digit,
> a numeric letter, a combining mark, a non-spacing
> mark, or an ignorable control character.
>
> Does this need a PEP?
Sure does. Since this could create a serious burden for code
protability, I'd like to see a serious section on motivation and
discussion on how to keep Unicode out of the standard library and out
of most 3rd party distributions. Without that I'm strongly -1.
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-Dev
mailing list