[Python-checkins] r57051 - peps/trunk/pep-3131.txt
martin.v.loewis
python-checkins at python.org
Wed Aug 15 09:50:22 CEST 2007
Author: martin.v.loewis
Date: Wed Aug 15 09:50:22 2007
New Revision: 57051
Modified:
peps/trunk/pep-3131.txt
Log:
Explain XID_Start and XID_Continue properly;
refer to DerivedCoreProperties.
Modified: peps/trunk/pep-3131.txt
==============================================================================
--- peps/trunk/pep-3131.txt (original)
+++ peps/trunk/pep-3131.txt Wed Aug 15 09:50:22 2007
@@ -71,16 +71,26 @@
The identifier syntax is ``<XID_Start> <XID_Continue>*``.
-``XID_Start`` is defined as all characters having one of the general
+The exact specification of what characters have the XID_Start or
+XID_Continue properties can be found in the DerivedCoreProperties
+file of the Unicode data in use by Python (4.1 at the time this
+PEP was written), see [6]_. For reference, the construction rules
+for these sets are given below. The XID_ properties are derived
+from ID_Start/ID_Continue, which are derived themselves.
+
+``ID_Start`` is defined as all characters having one of the general
categories uppercase letters (Lu), lowercase letters (Ll), titlecase
letters (Lt), modifier letters (Lm), other letters (Lo), letter
numbers (Nl), the underscore, and characters carrying the
-Other_ID_Start property (XXX adjust for XID_Start).
+Other_ID_Start property. ``XID_Start`` then closes this set under
+normalization, by removing all characters whose NFKC normalization
+is not of the form ID_Start ID_Continue* anymore.
-``XID_Continue`` is defined as all characters in ``XID_Start``, plus
+``ID_Continue`` is defined as all characters in ``ID_Start``, plus
nonspacing marks (Mn), spacing combining marks (Mc), decimal number
(Nd), connector punctuations (Pc), and characters carryig the
-Other_ID_Continue property (XXX adjust for XID_Continue).
+Other_ID_Continue property. Again, ``XID_Continue`` closes this set
+under NFKC-normalization; it also adds U+00B7 to support Catalan.
All identifiers are converted into the normal form NFKC while parsing;
comparison of identifiers is based on NFKC.
@@ -251,6 +261,7 @@
.. [3] http://www.unicode.org/reports/tr36/
.. [4] http://mail.python.org/pipermail/python-3000/2007-June/008161.html
.. [5] http://mail.python.org/pipermail/python-3000/2007-May/007925.html
+.. [6] http://www.unicode.org/Public/4.1.0/ucd/DerivedCoreProperties.txt
Copyright
=========
More information about the Python-checkins
mailing list