[Python-Dev] Python and the Unicode Character Database
M.-A. Lemburg
mal at egenix.com
Sun Nov 28 23:48:59 CET 2010
Alexander Belopolsky wrote:
> Two recently reported issues brought into light the fact that Python
> language definition is closely tied to character properties maintained
> by the Unicode Consortium. [1,2] For example, when Python switches to
> Unicode 6.0.0 (planned for the upcoming 3.2 release), we will gain two
> additional characters that Python can use in identifiers. [3]
>
> With Python 3.1:
>
>>>> exec('\u0CF1 = 1')
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "<string>", line 1
> ೱ = 1
> ^
> SyntaxError: invalid character in identifier
>
> but with Python 3.2a4:
>
>>>> exec('\u0CF1 = 1')
>>>> eval('\u0CF1')
> 1
Such changes are not new, but I agree that they should probably
be highlighted in the "What's new in Python x.x".
> Of course, the likelihood is low that this change will affect any
> user, but the change in str.isspace() reported in [1] is likely to
> cause some trouble:
>
> Python 2.6.5:
>>>> u'A\u200bB'.split()
> [u'A', u'B']
>
> Python 2.7:
>>>> u'A\u200bB'.split()
> [u'A\u200bB']
That's a classical bug fix.
> While we have little choice but to follow UCD in defining
> str.isidentifier(), I think Python can promise users more stability in
> what it treats as space or as a digit in its builtins.
Why should we divert from the work done by the Unicode Consortium ?
After all, most of their changes are in fact bug fixes as well.
> For example,
> I don't think that supporting
>
>>>> float('١٢٣٤.٥٦')
> 1234.56
>
> is more important than to assure users that once their program
> accepted some text as a number, they can assume that the text is
> ASCII.
Sorry, but I don't agree.
If ASCII numerals are an important aspect of an application, the
application should make sure that only those numerals are used
(e.g. by using a regular expression for checking).
In a Unicode world, not accepting non-Arabic numerals would be
a limitation, not a feature. Besides Python has had this support
since Python 1.6.
> [1] http://bugs.python.org/issue10567
> [2] http://bugs.python.org/issue10557
> [3] http://www.unicode.org/versions/Unicode6.0.0/#Database_Changes
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Nov 28 2010)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
More information about the Python-Dev
mailing list