[Python-3000] PEP 3131 accepted

Jim Jewett jimjjewett at gmail.com
Wed May 23 23:25:00 CEST 2007


On 5/23/07, Guido van Rossum <guido at python.org> wrote:
> On 5/23/07, Jim Jewett <jimjjewett at gmail.com> wrote:
> > Certain cut-and-paste errors (such as cutting from a word document
> > that uses "smart quotes") will change from syntax errors to silently
> > creating new identifiers.

> Really? Are those quote characters considered letters by the Unicode standard?

I'm not certain which specific character MS Word uses for smart
quotes.  My best guess is that it is actually "PRIVATE USE 1", which
is supposed to be ignored (don't prevent it; just pretend it isn't
there).

My fears were heightened by
http://www.unicode.org/reports/tr31/tr31-8.html.  They discuss NFKC
canonicalization (though another tech report recommends NFKD.  If you
use NFKC, they say to modify it so that because U+0374 ( ʹ ) GREEK
NUMERAL SIGN should not be allowed, but it folds to U+02B9 ( ʹ )
MODIFIER LETTER PRIME, which they claim should be allowed.

Within the codepoints < 256,
if we ban rather than ignore,
the only remaining problems are likely to be

(1)  that we must add _ as an allowed ID start, and
(2)  we must decide whether or not to allow the recommended

00AA          ; ID_Start # L&       FEMININE ORDINAL INDICATOR
00B5          ; ID_Start # L&       MICRO SIGN
00BA          ; ID_Start # L&       MASCULINE ORDINAL INDICATOR

(also in XID_START, and in the CONTINUE sets)

-jJ


More information about the Python-3000 mailing list