utf-8 and ctypes

MRAB python at mrabarnett.plus.com
Tue Sep 28 19:33:12 EDT 2010


On 28/09/2010 23:54, Brendan Miller wrote:
> I'm using python 2.5.
>
> Currently I have some python bindings written in ctypes. On the C
> side, my strings are in utf-8. On the python side I use
> ctypes.c_char_p to convert my strings to python strings. However, this
> seems to break for non-ascii characters.
>
> It seems that characters not in the ascii subset of UTF-8 are
> discarded by c_char_p during the conversion, or at least they don't
> print out when I go to print the string.
>
> Does python not support utf-8 strings? Is there some other way I
> should be doing the conversion?
>
Python does support bytestrings (8 bits per character, 'str' in Python
2), so you should be getting all the bytes that comprise the UTF-8
string up to the terminating zero-byte). You then need to decode the
string from UTF-8 to Unicode.



More information about the Python-list mailing list