[Python-Dev] Python and the Unicode Character Database

M.-A. Lemburg mal at egenix.com
Sun Nov 28 23:48:59 CET 2010


Alexander Belopolsky wrote:
> Two recently reported issues brought into light the fact that Python
> language definition is closely tied to character properties maintained
> by the Unicode Consortium. [1,2]  For example, when Python switches to
> Unicode 6.0.0 (planned for the upcoming 3.2 release), we will gain two
> additional characters that Python can use in identifiers. [3]
> 
> With Python 3.1:
> 
>>>> exec('\u0CF1 = 1')
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File "<string>", line 1
>    ೱ = 1
>      ^
> SyntaxError: invalid character in identifier
> 
> but with Python 3.2a4:
> 
>>>> exec('\u0CF1 = 1')
>>>> eval('\u0CF1')
> 1

Such changes are not new, but I agree that they should probably
be highlighted in the "What's new in Python x.x".

> Of course, the likelihood is low that this change will affect any
> user, but the change in str.isspace() reported in [1] is likely to
> cause some trouble:
> 
> Python 2.6.5:
>>>> u'A\u200bB'.split()
> [u'A', u'B']
> 
> Python 2.7:
>>>> u'A\u200bB'.split()
> [u'A\u200bB']

That's a classical bug fix.

> While we have little choice but to follow UCD in defining
> str.isidentifier(), I think Python can promise users more stability in
> what it treats as space or as a digit in its builtins. 

Why should we divert from the work done by the Unicode Consortium ?
After all, most of their changes are in fact bug fixes as well.

> For example,
> I don't think that supporting
> 
>>>> float('١٢٣٤.٥٦')
> 1234.56
> 
> is more important than to assure users that once their program
> accepted some text as a number, they can assume that the text is
> ASCII.

Sorry, but I don't agree.

If ASCII numerals are an important aspect of an application, the
application should make sure that only those numerals are used
(e.g. by using a regular expression for checking).

In a Unicode world, not accepting non-Arabic numerals would be
a limitation, not a feature. Besides Python has had this support
since Python 1.6.

> [1] http://bugs.python.org/issue10567
> [2] http://bugs.python.org/issue10557
> [3] http://www.unicode.org/versions/Unicode6.0.0/#Database_Changes

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 28 2010)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


More information about the Python-Dev mailing list