[Python-ideas] π = math.pi

Chris Angelico rosuav at gmail.com
Sat Jun 3 14:41:22 EDT 2017


On Sun, Jun 4, 2017 at 2:48 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> On Sun, Jun 04, 2017 at 02:36:50AM +1000, Steven D'Aprano wrote:
>
>> But Python 3.5 does treat it as an identifier!
>>
>> py> ℘ = 1  # should be a SyntaxError ?
>> py> ℘
>> 1
>>
>> There's a bug here, somewhere, I'm just not sure where...
>
> That appears to be the only Symbol Math character which is accepted as
> an identifier in Python 3.5:
>
> py> import unicodedata
> py> all_unicode = map(chr, range(0x110000))
> py> symbols = [c for c in all_unicode if unicodedata.category(c) == 'Sm']
> py> len(symbols)
> 948
> py> ns = {}
> py> for c in symbols:
> ...     try:
> ...             exec(c + " = 1", ns)
> ...     except SyntaxError:
> ...             pass
> ...     else:
> ...             print(c, unicodedata.name(c))
> ...
> ℘ SCRIPT CAPITAL P
> py>

Curious. And not specific to 3.5 - the exact same thing happens in
3.7. Here's the full category breakdown:

cats = collections.defaultdict(int)
ns = {}

for c in map(chr, range(1, 0x110000)):
    try:
        exec(c + " = 1", ns)
    except SyntaxError:
        pass
    except UnicodeEncodeError:
        if unicodedata.category(c) != "Cs": raise
    else:
        cats[unicodedata.category(c)] += 1

defaultdict(<class 'int'>, {'Po': 1, 'Lu': 1702, 'Pc': 1, 'Ll': 2063,
'Lo': 112703, 'Lt': 31, 'Lm': 245, 'Nl': 236, 'Mn': 2, 'Sm': 1, 'So':
1})

For reference, as well as the 948 Sm, there are 1690 Mn and 5777 So,
but only these characters are valid from them:

\u1885 Mn MONGOLIAN LETTER ALI GALI BALUDA
\u1886 Mn MONGOLIAN LETTER ALI GALI THREE BALUDA
℘ Sm SCRIPT CAPITAL P
℮ So ESTIMATED SYMBOL

2118 SCRIPT CAPITAL P and 212E ESTIMATED SYMBOL are listed in
PropList.txt as Other_ID_Start, so they make sense. But that doesn't
explain the two characters from category Mn. It also doesn't explain
why U+309B and U+309C are *not* valid, despite being declared
Other_ID_Start. Maybe it's a bug? Maybe 309B and 309C somehow got
switched into 1885 and 1886??

ChrisA


More information about the Python-ideas mailing list