[Python-Dev] Unicode 8.0 and 3.5
Steven D'Aprano
steve at pearwood.info
Fri Jun 19 01:56:44 CEST 2015
On Thu, Jun 18, 2015 at 08:34:14PM +0100, MRAB wrote:
> On 2015-06-18 19:33, Larry Hastings wrote:
> >On 06/18/2015 11:27 AM, Terry Reedy wrote:
> >>Unicode 8.0 was just released. Can we have unicodedata updated to
> >>match in 3.5?
> >>
> >
> >What does this entail? Data changes, code changes, both?
> >
> It looks like just data changes.
At the very least, there is a change to the casefolding algorithm.
Cherokee was classified as unicameral but is now considered bicameral
(two cases, like English). Unusually, case-folding Cherokee maps to
uppercase rather than lowercase.
The full set of changes is listed here:
http://unicode.org/versions/Unicode8.0.0/
Apart from the addition of 7716 characters and changes to
str.casefold(), I don't think any of the changes will make a big
difference to Python's implementation. But it would be good to support
Unicode 8 (to the degree that Python actually does support Unicode,
rather than just that character set part of it).
> There are additional codepoints and a renamed property (which the
> standard library doesn't support anyway).
Which one are you referring to, Indic_Matra_Category renamed to
Indic_Positional_Category?
--
Steve
More information about the Python-Dev
mailing list