[Python-checkins] r57051 - peps/trunk/pep-3131.txt

martin.v.loewis python-checkins at python.org
Wed Aug 15 09:50:22 CEST 2007


Author: martin.v.loewis
Date: Wed Aug 15 09:50:22 2007
New Revision: 57051

Modified:
   peps/trunk/pep-3131.txt
Log:
Explain XID_Start and XID_Continue properly;
refer to DerivedCoreProperties.


Modified: peps/trunk/pep-3131.txt
==============================================================================
--- peps/trunk/pep-3131.txt	(original)
+++ peps/trunk/pep-3131.txt	Wed Aug 15 09:50:22 2007
@@ -71,16 +71,26 @@
 
 The identifier syntax is ``<XID_Start> <XID_Continue>*``.
 
-``XID_Start`` is defined as all characters having one of the general
+The exact specification of what characters have the XID_Start or
+XID_Continue properties can be found in the DerivedCoreProperties
+file of the Unicode data in use by Python (4.1 at the time this
+PEP was written), see [6]_. For reference, the construction rules
+for these sets are given below. The XID_ properties are derived
+from ID_Start/ID_Continue, which are derived themselves.
+
+``ID_Start`` is defined as all characters having one of the general
 categories uppercase letters (Lu), lowercase letters (Ll), titlecase
 letters (Lt), modifier letters (Lm), other letters (Lo), letter
 numbers (Nl), the underscore, and characters carrying the
-Other_ID_Start property (XXX adjust for XID_Start).
+Other_ID_Start property. ``XID_Start`` then closes this set under
+normalization, by removing all characters whose NFKC normalization
+is not of the form ID_Start ID_Continue* anymore.
 
-``XID_Continue`` is defined as all characters in ``XID_Start``, plus
+``ID_Continue`` is defined as all characters in ``ID_Start``, plus
 nonspacing marks (Mn), spacing combining marks (Mc), decimal number
 (Nd), connector punctuations (Pc), and characters carryig the
-Other_ID_Continue property (XXX adjust for XID_Continue).
+Other_ID_Continue property. Again, ``XID_Continue`` closes this set
+under NFKC-normalization; it also adds U+00B7 to support Catalan.
 
 All identifiers are converted into the normal form NFKC while parsing;
 comparison of identifiers is based on NFKC.
@@ -251,6 +261,7 @@
 .. [3] http://www.unicode.org/reports/tr36/
 .. [4] http://mail.python.org/pipermail/python-3000/2007-June/008161.html
 .. [5] http://mail.python.org/pipermail/python-3000/2007-May/007925.html
+.. [6] http://www.unicode.org/Public/4.1.0/ucd/DerivedCoreProperties.txt
 
 Copyright
 =========


More information about the Python-checkins mailing list