Grapheme clusters, a.k.a.real characters
Terry Reedy
tjreedy at udel.edu
Fri Jul 14 20:02:48 EDT 2017
On 7/14/2017 5:51 PM, Marko Rauhamaa wrote:
> Yes, in Python2, Go, C and GNU textutils, when you print a text string
> containing a mixture of languages, you see characters.
>
> Why?
>
> Because that's what the terminal emulator chooses to do upon receiving
> those bytes.
>>> s = u'\u1171\u2222\u3333\u4444\u5555'
>>> s
u'\u1171\u2222\u3333\u4444\u5555'
>>> print(s)
ᅱ∢㌳䑄啕
>>> b = s.encode('utf-8')
>>> b
'\xe1\x85\xb1\xe2\x88\xa2\xe3\x8c\xb3\xe4\x91\x84\xe5\x95\x95'
>>> print(b)
ᅱ∢㌳䑄啕
I prefer the accurate 5 char print of the text string to the print of
the bytes.
--
Terry Jan Reedy
More information about the Python-list
mailing list