Python Unicode handling wins again -- mostly

wxjmfauth at gmail.com wxjmfauth at gmail.com
Mon Dec 2 13:39:26 CET 2013


Le dimanche 1 décembre 2013 21:54:48 UTC+1, Tim Delaney a écrit :
> On 2 December 2013 07:15,  <wxjm... at gmail.com> wrote:
> 
> 
> 0.11.13 02:44, Steven D'Aprano написав(ла):
> 
> 
> > (2) If you reverse that string, does it give "lëon"? The implication of
> 
> > this question is that strings should operate on grapheme clusters rather
> 
> > than code points. ...
> 
> >
> 
> 
> 
> BTW, a grapheme cluster *is* a code points cluster.
> 
> 
> 
> Anyone with a decent level of reading comprehension would have understood that Steven knows that. The implied word is "individual" i.e. "... rather than [individual] code points".
> 
> 
> 
> Why am I responding to a troll? Probably because out of all his baseless complaints about the FSR, he *did* have one valid point about performance that has now been fixed.
> 
> 
> Tim Delaney


My English is far too be perfect, I think I understood
it correctly.

The point in not in the words "grapheme" or "code point",
neither in "individual", ;-), the point is in "rather".

If one wishes to work on a set of graphemes, one can
only work with the set of the corresponding code points.


To complete Serhiy Storchaka's example:

>>> len(unicodedata.normalize('NFKD', '\ufdfa')) == 18
True

is correct.

jmf

PS I did not even speak about the FSR.



More information about the Python-list mailing list