Grapheme clusters, a.k.a.real characters
Neil Cerutti
neilc at norwich.edu
Fri Jul 14 14:22:46 EDT 2017
On 2017-07-14, Rhodri James <rhodri at kynesim.co.uk> wrote:
> On 14/07/17 15:32, Michael Torrie wrote:
>> Are you saying that dealing with Unicode in Google Go, which
>> uses UTF-8 in memory, is adding an extra layer of complexity
>> and makes things worse than they might be in Python?
>
> I'm not familiar with Go. If the programmer has to be aware
> that the she is using UTF-8 under the hood, then yes, it does
> add an extra layer of complexity. You have to remember the
> rules of UTF-8 as well as everything else.
Go represents strings as sequences of bytes. It provides separate
API's that allow you to regard those bytes as either plain old
bytes, or as a sequence of runes (not-necessarily normalized
codepoints). If your bytes strings aren't in UTF-8, then Go Away.
https://blog.golang.org/strings
--
Neil Cerutti
More information about the Python-list
mailing list