[Python-Dev] Unicode 8.0 and 3.5

Steven D'Aprano steve at pearwood.info
Fri Jun 19 01:56:44 CEST 2015


On Thu, Jun 18, 2015 at 08:34:14PM +0100, MRAB wrote:
> On 2015-06-18 19:33, Larry Hastings wrote:
> >On 06/18/2015 11:27 AM, Terry Reedy wrote:
> >>Unicode 8.0 was just released.  Can we have unicodedata updated to
> >>match in 3.5?
> >>
> >
> >What does this entail?  Data changes, code changes, both?
> >
> It looks like just data changes.

At the very least, there is a change to the casefolding algorithm. 
Cherokee was classified as unicameral but is now considered bicameral 
(two cases, like English). Unusually, case-folding Cherokee maps to 
uppercase rather than lowercase.

The full set of changes is listed here:

http://unicode.org/versions/Unicode8.0.0/

Apart from the addition of 7716 characters and changes to 
str.casefold(), I don't think any of the changes will make a big 
difference to Python's implementation. But it would be good to support 
Unicode 8 (to the degree that Python actually does support Unicode, 
rather than just that character set part of it).

 
> There are additional codepoints and a renamed property (which the
> standard library doesn't support anyway).

Which one are you referring to, Indic_Matra_Category renamed to 
Indic_Positional_Category?


-- 
Steve


More information about the Python-Dev mailing list