Glyphs and graphemes [was Re: Cult-like behaviour]
Chris Angelico
rosuav at gmail.com
Mon Jul 16 17:05:24 EDT 2018
On Tue, Jul 17, 2018 at 6:54 AM, Marko Rauhamaa <marko at pacujo.net> wrote:
> Chris Angelico <rosuav at gmail.com>:
>> Challenge: Reverse a string in UTF-8.
>
> Counter-challenge: Reverse a Unicode string:
>
> >>> s = "a\u0304e"
> >>> s
> 'āe'
> >>> L = list(s)
> >>> L.reverse()
> >>> "".join(L)
> 'ēa'
>
>> Challenge: Center text in UTF-8.
>
> Counter-challenge: Center a Unicode string:
>
> >>> t = s * 3
> >>> t
> 'āeāeāe'
> >>> t.center(9)
> 'āeāeāe'
>
>> Challenge: Given a (non-initial) character in a buffer of UTF-8 bytes,
>> find the immediately preceding character.
>
> The counter-challenge is left as an exercise for the reader.
>
>> All of these are fundamentally difficult by nature, but if you index
>> by code points, you eliminate one level of difficulty; indexing by
>> bytes retains all the existing difficulty and adds another layer.
>
> Oh, sorry. I thought you were suggesting Unicode strings would make the
> challenges somehow easy.
So now that you've actually read my entire post, you'll see that there
are fundamental difficulties, but that UTF-8 introduces more. Great.
Now go ahead and reply to my post, knowing my actual point.
Congratulations on posting something of no value.
ChrisA
More information about the Python-list
mailing list