[Python-checkins] r55769 - peps/trunk/pep-3131.txt
python-checkins at python.org
Tue Jun 5 20:54:37 CEST 2007
Date: Tue Jun 5 20:54:31 2007
New Revision: 55769
Add ?!ng's roundup.
--- peps/trunk/pep-3131.txt (original)
+++ peps/trunk/pep-3131.txt Tue Jun 5 20:54:31 2007
@@ -154,6 +154,91 @@
for RTL languages); if there is a need, these can be added
+Another open issue is the choice of normalization form: some
+people suggest to use NFKC instead of NFC, others suggest to
+ban compatibility characters.
+Ka-Ping Yee summarizes discussion and further objection
+in _ as such:
+A. Should identifiers be allowed to contain any Unicode letter?
+ Drawbacks of allowing non-ASCII identifiers wholesale:
+ 1. Python will lose the ability to make a reliable round trip to
+ a human-readable display on screen or on paper.
+ 2. Python will become vulnerable to a new class of security exploits;
+ code and submitted patches will be much harder to inspect.
+ 3. Humans will no longer be able to validate Python syntax.
+ 4. Unicode is young; its problems are not yet well understood and
+ solved; tool support is weak.
+ 5. Languages with non-ASCII identifiers use different character sets
+ and normalization schemes; PEP 3131's choices are non-obvious.
+ 6. The Unicode bidi algorithm yields an extremely confusing display
+ order for RTL text when digits or operators are nearby.
+B. Should the default behaviour accept only ASCII identifiers, or
+ should it accept identifiers containing non-ASCII characters?
+ Arguments for ASCII only by default:
+ 1. Non-ASCII identifiers by default makes common practice/assumptions
+ subtly/unknowingly wrong; rarely wrong is worse than obviously wrong.
+ 2. Better to raise a warning than to fail silently when encountering
+ an probably unexpected situation.
+ 3. All of current usage is ASCII-only; the vast majority of future
+ usage will be ASCII-only.
+ 3. It is the pockets of Unicode adoption that are parochial, not the
+ ASCII advocates.
+ 4. Python should audit for ASCII-only identifiers for the same
+ reasons that it audits for tab-space consistency
+ 5. Incremental change is safer.
+ 6. An ASCII-only default favors open-source development and sharing
+ of source code.
+ 7. Existing projects won't have to waste any brainpower worrying
+ about the implications of Unicode identifiers.
+C. Should non-ASCII identifiers be optional?
+ Various voices in support of a flag (although there's been debate
+ over which should be the default, no one seems to be saying that
+ there shouldn't be an off switch)
+D. Should the identifier character set be configurable?
+ Various voices proposing and supporting a selectable character set,
+ so that users can get all the benefits of using their own language
+ without the drawbacks of confusable/unfamiliar characters
+E. Which identifier characters should be allowed?
+ 1. What to do about bidi format control characters?
+ 2. What about other ID_Continue characters? What about characters
+ that look like punctuation? What about other recommendations
+ in UTS #39? What about mixed-script identifiers?
+F. Which normalization form should be used, NFC or NFKC?
+G. Should source code be required to be in normalized form?
@@ -161,6 +246,7 @@
..  http://www.unicode.org/reports/tr31/
..  http://www.unicode.org/reports/tr39/
..  http://www.unicode.org/reports/tr36/
+..  http://mail.python.org/pipermail/python-3000/2007-June/008161.html
More information about the Python-checkins