
Guido van Rossum wrote:
2008/8/25 M.-A. Lemburg <mal@egenix.com <mailto:mal@egenix.com>>:
I would really like to see more Unicode support in Python, e.g. for collation, compression, indexing based on graphemes and code points, better support for special casing situations (to cover e.g. the dotted vs. non-dotted i in the Turkish scripts), etc.
There are also a few changes that we'd need to incorporate into the UTF codecs, e.g. warn about more ill-formed byte sequences.
Would Google be willing to contribute such support or part of it ?
That depends purely on how much need Google itself has for these features. I'll ask around, but for now I wouldn't bet on anything beyond the three points I raised at the start of this thread:
1. Upgrade the unicodata module to the Unicode 5.1.0 standard 2. Extende the unicodedata module with some additional properties 3. Add support for Unicode properties to the regex syntax, including Boolean combinations
I think an Improve Unicode Support PEP would be a good idea to collect (and get approval or not for) various ideas from various people, even if Google only implements part of the PEP.