[Python-Dev] Python and the Unicode Character Database

Alexander Belopolsky alexander.belopolsky at gmail.com
Mon Nov 29 02:24:24 CET 2010


On Sun, Nov 28, 2010 at 7:55 PM, Ben Finney <ben+python at benfinney.id.au> wrote:
..
>> Of course it is fun that Python can process Bengali numerals, but so
>> would be allowing Roman numerals. There is a reason why after careful
>> consideration, PEP 313 was ultimately rejected.
>
> Rejecting a proposed *new* capability is a different matter from
> disabling an *existing* capability which works in existing Python
> releases.

Was this capability ever documented?  It does not feel like a
deliberate feature.  If it was, '\N{ARABIC DECIMAL SEPARATOR}' would
be accepted in arabic-indic notation.   If feels more like a CPython
implementation detail similar to say:

>>> int('10') is 10
True
>>> int('10000') is 10000
False

Note that the underlying PyUnicode_EncodeDecimal() function is
described in the unicodeobject.h header file as follows:

/* --- Decimal Encoder ---------------------------------------------------- */

/* Takes a Unicode string holding a decimal value and writes it into
   an output buffer using standard ASCII digit codes.
  ..
  The encoder converts whitespace to ' ', decimal characters to their
   corresponding ASCII digit and all other Latin-1 characters except
   \0 as-is. Characters outside this range (Unicode ordinals 1-256)
   are treated as errors. This includes embedded NULL bytes.
 */

So the support for non-ASCII digits is accidental and should be
treated as a bug.


More information about the Python-Dev mailing list