Glyphs and graphemes [was Re: Cult-like behaviour]
Marko Rauhamaa
marko at pacujo.net
Mon Jul 16 16:54:35 EDT 2018
Chris Angelico <rosuav at gmail.com>:
> Challenge: Reverse a string in UTF-8.
Counter-challenge: Reverse a Unicode string:
>>> s = "a\u0304e"
>>> s
'āe'
>>> L = list(s)
>>> L.reverse()
>>> "".join(L)
'ēa'
> Challenge: Center text in UTF-8.
Counter-challenge: Center a Unicode string:
>>> t = s * 3
>>> t
'āeāeāe'
>>> t.center(9)
'āeāeāe'
> Challenge: Given a (non-initial) character in a buffer of UTF-8 bytes,
> find the immediately preceding character.
The counter-challenge is left as an exercise for the reader.
> All of these are fundamentally difficult by nature, but if you index
> by code points, you eliminate one level of difficulty; indexing by
> bytes retains all the existing difficulty and adds another layer.
Oh, sorry. I thought you were suggesting Unicode strings would make the
challenges somehow easy.
Marko
More information about the Python-list
mailing list