[Python-Dev] Odd lines in unicodedata_db.h

Stephen J. Turnbull stephen at xemacs.org
Sun Apr 4 12:59:14 CEST 2010


Amaury Forgeot d'Arc writes:

 > I don't think so. Unicode 3.2 did contain two entries with large
 > numeric values.  The file Unihan-3.2.0.txt contains these two
 > lines:
 > 
 > U+4EAC	kPrimaryNumeric	10,000,000,000,000,000 ten quadrillion (American)
 > U+5793	kPrimaryNumeric	100,000,000,000,000,000,000 hundred quintillion
 > (American)

They are related to the Chinese numbering system.  I recall U+4EAC
having that value from my textbooks (it's the "kyo" in Tokyo, and the
"jing" in "Beijing", so quite memorable), and U+5793 looks familiar
(it's not otherwise used in Japanese AFAIK, so I'm not sure, but it
seems quite plausible that there would be a character for 10000^5).

 > For some reason newer versions of the unicode standard removed
 > these values.

The characters are still there.  The numeric values were probably
removed because in practice they're not actually used (at least,
almost never in Japanese).  It seems a little sad to save 150 bytes or
so in a table and lose the historical meanings.


More information about the Python-Dev mailing list