Re: [Python-Dev] Python and the Unicode Character Database

2 Dec 2010

      "Martin v. Löwis" wrote:
...
...
...
Now, one may wonder what precisely a "possibly signed floating point
number" is, but most likely, this refers to
floatnumber   ::=  pointfloat | exponentfloat
pointfloat    ::=  [intpart] fraction | intpart "."
exponentfloat ::=  (intpart | pointfloat) exponent
intpart       ::=  digit+
fraction      ::=  "." digit+
exponent      ::=  ("e" | "E") ["+" | "-"] digit+
digit          ::=  "0"..."9"
I don't see why the language spec should limit the wealth of number
formats supported by float().
If it doesn't, there should be some other specification of what
is correct and what is not. It must not be unspecified.
True.
...
...
It is not uncommon for Asians and other non-Latin script users to
use their own native script symbols for numbers. Just because these
digits may look strange to someone doesn't mean that they are
meaningless or should be discarded.
Then these users should speak up and indicate their need, or somebody
should speak up and confirm that there are users who actually want
'١٢٣٤.٥٦' to denote 1234.56. To my knowledge, there is no writing
system in which '١٢٣٤.٥٦e4' means 12345600.0.
I'm not sure what you're after here.
...
...
Please also remember that Python3 now allows Unicode names for
identifiers for much the same reasons.
No no no. Addition of Unicode identifiers has a well-designed,
deliberate specification, with a PEP and all. The support for
non-ASCII digits in float appears to be ad-hoc, and not founded
on actual needs of actual users.
Please note that we didn't have PEPs and the PEP process at the
time. The Unicode proposal predates and in some respects inspired
the PEP process.

The decision to add this support was deliberate based on the desire
to support as much of the nice features of Unicode in Python as
we could. At least that was what was driving me at the time.

Regarding actual needs of actual users: I don't buy that as an
argument when it comes to supporting a standard that is meant
to attract users with non-ASCII origins.

Some references you may want to read up on:

http://en.wikipedia.org/wiki/Numbers_in_Chinese_culture
http://en.wikipedia.org/wiki/Vietnamese_numerals
http://en.wikipedia.org/wiki/Korean_numerals
http://en.wikipedia.org/wiki/Japanese_numerals

Even MS Office supports them:

http://languages.siuc.edu/Chinese/Language_Settings.html
...
...
Note that the support in float() (and the other numeric constructors)
to work with Unicode code points was explicitly added when Unicode
support was added to Python and has been available since Python 1.6.
That doesn't necessarily make it useful. Alexander's complaint is that
it makes Python unstable (i.e. changing as the UCD changes).
If that were true, then all Unicode database (UCD) changes would make
Python unstable. However, most changes to existing code points in the UCS
are bug fixes, so they actually have a stabilizing quality more than
a destabilizing one.
...
...
It is not a bug by any definition of "bug"
Most certainly it is: the documentation is either underspecified,
or deviates from the implementation (when taking the most plausible
interpretation). This is the very definition of "bug".
The implementation is not a bug and neither was this a bug in the
2.x series of the Python documentation. The Python 3.x docs apparently
introduced a reference to the language spec which is clearly not
capturing the wealth of possible inputs.

So, yes, we're talking about a documentation bug, but not an
implementation bug.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 29 2010)
...
...
...
Python/Zope Consulting and Support ...        http://www.egenix.com/
mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

Re: [Python-Dev] Python and the Unicode Character Database

M.-A. Lemburg