Encoding of surrogate code points to UTF-8
wxjmfauth at gmail.com
wxjmfauth at gmail.com
Tue Oct 8 14:24:58 EDT 2013
--------
>>> sys.version
'3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600 32 bit (Intel)]'
>>> '\ud800'.encode('utf-8')
Traceback (most recent call last):
File "<eta last command>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0:
surrogates not allowed
>>> '\ud800'.encode('utf-32-be')
b'\x00\x00\xd8\x00'
>>> '\ud800'.encode('utf-32-le')
b'\x00\xd8\x00\x00'
>>> '\ud800'.encode('utf-32')
b'\xff\xfe\x00\x00\x00\xd8\x00\x00'
jmf
More information about the Python-list
mailing list