[Python-Dev] Python and the Unicode Character Database
Stephen J. Turnbull
stephen at xemacs.org
Tue Nov 30 05:20:11 CET 2010
M.-A. Lemburg writes:
> Just because ASCII-proponents may have a hard time reading such
> literals,
That's not the point.
> doesn't mean that script users have the same trouble.
The script users may have no trouble reading them, but that doesn't
mean it's not a YAGNI. In Japanese, it's a YAGNI except in addresses
on New Year cards and in dates, which could be handled by specialized
modules, or by a generic module for extracting numeric information
from general (as opposed to program) text. Neither of those is likely
to appear in program text in context where they would be used as a
numeric literal.
In fact, Python *does* consider it a YAGNI for Han! Although my
apartment number would be written "七〇四" on a New Year card, Python
won't parse it as 704: unicodedata considers those digits to be Lo,
except for "〇" which fails anyway because it's Nl, not Nd. (To add
insult to injury, it doesn't even return numeric values for those
characters, even though any Han-user would consider them numeric when
used in isolation, except that Japanese would be likely to consider
"〇" to be the non-numeric "maru" symbol, ie, circle, meaning "OK"!)
The whole concept of numeric in Unicode is a mess; why import that
mess into Python?
Can you give any examples where people do computation, keep books, or
do nuclear physics in non-Arabic numerals? I suppose Arabic users
might, but even there I suspect not.
More information about the Python-Dev
mailing list