[Python-Dev] Python and the Unicode Character Database

Mon Nov 29 01:01:12 CET 2010

On Sun, 28 Nov 2010 17:23:01 -0600
Benjamin Peterson <benjamin at python.org> wrote:
> 2010/11/28 M.-A. Lemburg <mal at egenix.com>:
> >
> >
> > "Martin v. Löwis" wrote:
> >>>>>>> float('١٢٣٤.٥٦')
> >>>> 1234.56
> >>
> >> I think it's a bug that this works. The definition of the float builtin says
> >>
> >> Convert a string or a number to floating point. If the argument is a
> >> string, it must contain a possibly signed decimal or floating point
> >> number, possibly embedded in whitespace. The argument may also be
> >> '[+|-]nan' or '[+|-]inf'.
> >>
> >> Now, one may wonder what precisely a "possibly signed floating point
> >> number" is, but most likely, this refers to
> >>
> >> floatnumber   ::=  pointfloat | exponentfloat
> >> pointfloat    ::=  [intpart] fraction | intpart "."
> >> exponentfloat ::=  (intpart | pointfloat) exponent
> >> intpart       ::=  digit+
> >> fraction      ::=  "." digit+
> >> exponent      ::=  ("e" | "E") ["+" | "-"] digit+
> >> digit          ::=  "0"..."9"
> >
> > I don't see why the language spec should limit the wealth of number
> > formats supported by float().
> >
> > It is not uncommon for Asians and other non-Latin script users to
> > use their own native script symbols for numbers. Just because these
> > digits may look strange to someone doesn't mean that they are
> > meaningless or should be discarded.
> 
> That's different. Python doesn't assign any semantic meaning to the
> characters in identifiers. The non-latin support for numerals, though,
> could change the meaning of a program dramatically and needs to be
> well-specified. Whether int() should do this is debatable.

Perhaps int(), float(), Decimal() and friends could take an optional
parameter indicating whether non-ascii digits are considered. It would
then satisfy all parties.

Antoine.