[Python-Dev] Python and the Unicode Character Database
Alexander Belopolsky
alexander.belopolsky at gmail.com
Mon Nov 29 02:24:24 CET 2010
On Sun, Nov 28, 2010 at 7:55 PM, Ben Finney <ben+python at benfinney.id.au> wrote:
..
>> Of course it is fun that Python can process Bengali numerals, but so
>> would be allowing Roman numerals. There is a reason why after careful
>> consideration, PEP 313 was ultimately rejected.
>
> Rejecting a proposed *new* capability is a different matter from
> disabling an *existing* capability which works in existing Python
> releases.
Was this capability ever documented? It does not feel like a
deliberate feature. If it was, '\N{ARABIC DECIMAL SEPARATOR}' would
be accepted in arabic-indic notation. If feels more like a CPython
implementation detail similar to say:
>>> int('10') is 10
True
>>> int('10000') is 10000
False
Note that the underlying PyUnicode_EncodeDecimal() function is
described in the unicodeobject.h header file as follows:
/* --- Decimal Encoder ---------------------------------------------------- */
/* Takes a Unicode string holding a decimal value and writes it into
an output buffer using standard ASCII digit codes.
..
The encoder converts whitespace to ' ', decimal characters to their
corresponding ASCII digit and all other Latin-1 characters except
\0 as-is. Characters outside this range (Unicode ordinals 1-256)
are treated as errors. This includes embedded NULL bytes.
*/
So the support for non-ASCII digits is accidental and should be
treated as a bug.
More information about the Python-Dev
mailing list