[Python-checkins] r55399 - peps/trunk/pep-3131.txt

Thu May 17 11:01:44 CEST 2007

Author: martin.v.loewis
Date: Thu May 17 11:01:43 2007
New Revision: 55399

Modified:
   peps/trunk/pep-3131.txt
Log:
Include Other_ID_{Start|Continue}.


Modified: peps/trunk/pep-3131.txt
==============================================================================

--- peps/trunk/pep-3131.txt	(original)
+++ peps/trunk/pep-3131.txt	Thu May 17 11:01:43 2007
@@ -68,17 +68,19 @@
 
 The identifier syntax is ``<ID_Start> <ID_Continue>*``.
 
-``ID_Start`` is defined as all characters having one of the general categories
-uppercase letters (Lu), lowercase letters (Ll), titlecase letters (Lt), modifier
-letters (Lm), other letters (Lo), letter numbers (Nl), plus the underscore (XXX
-what are "stability extensions" listed in UAX 31).
-
-``ID_Continue`` is defined as all characters in ``ID_Start``, plus nonspacing
-marks (Mn), spacing combining marks (Mc), decimal number (Nd), and connector
-punctuations (Pc).
+``ID_Start`` is defined as all characters having one of the general
+categories uppercase letters (Lu), lowercase letters (Ll), titlecase
+letters (Lt), modifier letters (Lm), other letters (Lo), letter
+numbers (Nl), the underscore, and characters carrying the
+Other_ID_Start property.
+
+``ID_Continue`` is defined as all characters in ``ID_Start``, plus
+nonspacing marks (Mn), spacing combining marks (Mc), decimal number
+(Nd), connector punctuations (Pc), and characters carryig the
+Other_ID_Continue property.
 
-All identifiers are converted into the normal form NFC while parsing; comparison
-of identifiers is based on NFC.
+All identifiers are converted into the normal form NFC while parsing;
+comparison of identifiers is based on NFC.
 
 Policy Specification
 ====================
@@ -97,18 +99,19 @@
 
 The following changes will need to be made to the parser:
 
-1. If a non-ASCII character is found in the UTF-8 representation of the source
-   code, a forward scan is made to find the first ASCII non-identifier character
-   (e.g. a space or punctuation character)
-
-2. The entire UTF-8 string is passed to a function to normalize the string to
-   NFC, and then verify that it follows the identifier syntax. No such callout
-   is made for pure-ASCII identifiers, which continue to be parsed the way they
-   are today.
-
-3. If this specification is implemented for 2.x, reflective libraries (such as
-   pydoc) must be verified to continue to work when Unicode strings appear in
-   ``__dict__`` slots as keys.
+1. If a non-ASCII character is found in the UTF-8 representation of
+   the source code, a forward scan is made to find the first ASCII
+   non-identifier character (e.g. a space or punctuation character)
+
+2. The entire UTF-8 string is passed to a function to normalize the
+   string to NFC, and then verify that it follows the identifier
+   syntax. No such callout is made for pure-ASCII identifiers, which
+   continue to be parsed the way they are today. The Unicode database
+   must start including the Other_ID_{Start|Continue} property.
+
+3. If this specification is implemented for 2.x, reflective libraries
+   (such as pydoc) must be verified to continue to work when Unicode
+   strings appear in ``__dict__`` slots as keys.
 
 References
 ==========