[Python-3000] PEP 3131 accepted

Ka-Ping Yee python at zesty.ca
Wed May 23 23:11:00 CEST 2007


On Wed, 23 May 2007, Guido van Rossum wrote:
> On 5/23/07, Jim Jewett <jimjjewett at gmail.com> wrote:
> > Certain cut-and-paste errors (such as cutting from a word document
> > that uses "smart quotes") will change from syntax errors to silently
> > creating new identifiers.
>
> Really? Are those quote characters considered letters by the Unicode standard?

According to the table at

    http://www.dcl.hpi.uni-potsdam.de/home/loewis/table-331.html

, the following quote-like characters are not identifier characters:

    U+2018 LEFT SINGLE QUOTATION MARK
    U+2019 RIGHT SINGLE QUOTATION MARK
    U+201C LEFT DOUBLE QUOTATION MARK
    U+201D RIGHT DOUBLE QUOTATION MARK

I believe these four are the "smart quotes" produced by Word.

But the following are identifier characters:

    U+02BB MODIFIER LETTER TURNED COMMA (same glyph as U+2018)
    U+02BC MODIFIER LETTER APOSTROPHE (same glyph as U+2019)
    U+02EE MODIFIER LETTER DOUBLE APOSTROPHE (same glyph as U+201D)
    U+0312 COMBINING TURNED COMMA ABOVE (same glyph as U+2018)
    U+0313 COMBINING COMMA ABOVE (same glyph as U+2019)
    U+0315 COMBINING COMMA ABOVE RIGHT (same glyph as U+2019)

So there are three sets of characters that look the same:

    U+02BB = U+0312 = U+2018
    U+02BC = U+0313 = U+0315 = U+2019
    U+02EE = U+201D

U+0312, U+0313, and U+0315 are combining characters that cause the
comma to appear over the preceding letter, and they are not allowed
to appear as the first character in an identifier.  So, if your
editor displays combining characters as properly combined, they will
not be confusable with quotation marks; otherwise, they could be.


-- ?!ng


More information about the Python-3000 mailing list