[Python-Dev] Python and the Unicode Character Database
Terry Reedy
tjreedy at udel.edu
Wed Dec 1 00:19:30 CET 2010
On 11/30/2010 10:05 AM, Alexander Belopolsky wrote:
My general answers to the questions you have raised are as follows:
1. Each new feature release should use the latest version of the UCD as
of the first beta release (or perhaps a week or so before). New chars
are new features and the beta period can be used to (hopefully) iron out
any bugs introduced by a new UCD version.
2. The language specification should not be UCD version specific. Martin
pointed out that the definition of identifiers was intentionally written
to not be, bu referring to 'current version' or some such. On the other
hand, the UCD version used should be programatically discoverable,
perhaps as an attribute of sys or str.
3.. The UCD should not change in bugfix releases. New chars are new
features. Adding them in bugfix releases will introduce gratuitous
imcompatibilities between releases. People who want the latest Unicode
should either upgrade to the latest Python version or patch an older
version (but not expect core support for any problems that creates).
> Given that 2.7 will be maintained for 5 years and arguably Unicode
> Consortium takes backward compatibility very seriously, wouldn't it
> make sense to consider a backport at some point?
>
> I am sure we will soon see a bug report that the following does not
> work in 2.7: :-)
>>>> ord('\N{CAT FACE WITH WRY SMILE}')
> 128572
3 (cont). 2.7 is no different in that regard. It is feature frozen just
like all other x.y releases. And that is the answer to any such report.
If that code became valid in 2.7.2, for instance, it would still not
work in 2.7 and 2.7.1. Not working is not a bug; working is a new
feature introduced after 2.7 was released.
>>> - How specific should library reference manual be in defining methods
>>> affected by UCD such as str.upper()?
>>
>> It should specify what this actually does in Unicode terminology
>> (probably in addition to a layman's rephrase of that)
>>
>
> I opened an issue for this:
>
> http://bugs.python.org/issue10587
1,2 (cont). Good idea in general.
> I was more concerned about wide an narrow unicode CPython builds. Is
> it a bug that '\UXXXXXXXX'.isalpha() may disagree even when the two
> implementations are based on the same version of UCD?
4. While the difference between narrow/wide builds of (CPython) x.y
(which should have once constant UCD) cannot be completely masked, I
appreciate and generally agree with your efforts to minimize them. In
some cases, there will be a conflict/tradeoff between eliminating this
difference versus that.
--
Terry Jan Reedy
More information about the Python-Dev
mailing list