[Python-ideas] π = math.pi

Chris Angelico rosuav at gmail.com
Sat Jun 3 18:04:12 EDT 2017


On Sun, Jun 4, 2017 at 5:02 AM, Thomas Jollans <tjol at tjol.eu> wrote:
> On 03/06/17 20:41, Chris Angelico wrote:
>> [snip]
>> For reference, as well as the 948 Sm, there are 1690 Mn and 5777 So,
>> but only these characters are valid from them:
>>
>> \u1885 Mn MONGOLIAN LETTER ALI GALI BALUDA
>> \u1886 Mn MONGOLIAN LETTER ALI GALI THREE BALUDA
>> ℘ Sm SCRIPT CAPITAL P
>> ℮ So ESTIMATED SYMBOL
>>
>> 2118 SCRIPT CAPITAL P and 212E ESTIMATED SYMBOL are listed in
>> PropList.txt as Other_ID_Start, so they make sense. But that doesn't
>> explain the two characters from category Mn. It also doesn't explain
>> why U+309B and U+309C are *not* valid, despite being declared
>> Other_ID_Start. Maybe it's a bug? Maybe 309B and 309C somehow got
>> switched into 1885 and 1886??
>
> \u1885 and \u1886 are categorised as letters (category Lo) by my Python
> 3.5. (Which makes sense, right?) If your system puts them in category
> Mn, that's bound to be a bug somewhere.

rosuav at sikorsky:~$ python3.7 -c "import unicodedata;
print(unicodedata.unidata_version, unicodedata.category('\u1885'))"
9.0.0 Mn
rosuav at sikorsky:~$ python3.6 -c "import unicodedata;
print(unicodedata.unidata_version, unicodedata.category('\u1885'))"
8.0.0 Lo
rosuav at sikorsky:~$ python3.5 -c "import unicodedata;
print(unicodedata.unidata_version, unicodedata.category('\u1885'))"
8.0.0 Lo
rosuav at sikorsky:~$ python3.4 -c "import unicodedata;
print(unicodedata.unidata_version, unicodedata.category('\u1885'))"
6.3.0 Lo

Is it possible that there's a discrepancy between the Unicode version
used by the unicodedata module and the one used by the parser?

ChrisA


More information about the Python-ideas mailing list