[Python-Dev] Unicode 5.1.0

Terry Reedy tjreedy at udel.edu
Fri Aug 22 00:30:56 CEST 2008



Guido van Rossum wrote:
> I was just paid a visit by my Google colleague Mark Davis, co-founder
> of the Unicode project and the president of the Unicode Consortium. He
> would like to see improved Unicode support for Python. (Well duh. :-)
> On his list of top priorities are:
> 
> 1. Upgrade the unicodata module to the Unicode 5.1.0 standard
> 2. Extende the unicodedata module with some additional properties
> 3. Add support for Unicode properties to the regex syntax, including
> Boolean combinations
> 
> I've tried to explain our release schedule and
> no-new-features-in-point-releases policies to him, and he understands
> that it's too late to add #2 or #3 to 2.6 and 3.0, and that these will
> have to wait for 2.7 and 3.1, respectively. However, I've kept the
> door sligthtly ajar for adding #1 -- it can't be too much work and it
> can't have too much impact. Or can it? I don't actually know what the
> impact would be, so I'd like some impact from developers who are
> closer to the origins of the unicodedata module.
> 
> The two, quite separate, questions, then, are (a) how much work would
> it be to upgrade to version 5.1.0 of the database; and (b) would it be
> acceptable to do this post-beta3 (but before rc1). If the answer to
> (b) is positive, Google can help with (a).

http://www.unicode.org/versions/Unicode5.1.0/
"Unicode 5.1.0 contains over 100,000 characters, and provides 
significant additions and improvements..." to existing features, 
including new files and upgrades to existing files.  Sounds close to 
adding features ;-)

> In general, Google has needs in this area that can't wait for 2.7/3.1,
> so what we may end up doing is create internal implementations of all
> three features (compatible with Python 2.4 and later), publish them as
> open source on Google Code, and fold them into core Python at the
> first opportunity, which would likely be 2.7 and 3.1.

If possible, I would suggest going a bit further and release a '3rd' 
party replacement/extension package, including a Windows installer, that 
is also listed on PyPI.  Revised releases could and might need to be 
done even more rapidly than the bugfix release schedule would allow. 
(This could be done with other proposed new/revised modules also.)

What would need to be done now, I believe, if possible and acceptable, 
it to slightly repackage the core to put unicode (3.0 strings) and _re* 
code in a separate library so that they can be drop-in replaced or masked.

Terry Jan Reedy



More information about the Python-Dev mailing list