utf-8 and ctypes
python at mrabarnett.plus.com
Thu Sep 30 00:34:22 CEST 2010
On 29/09/2010 19:33, Brendan Miller wrote:
> 2010/9/29 Lawrence D'Oliveiro<ldo at geek-central.gen.new_zealand>:
>> In message<mailman.1132.1285714474.29448.python-list at python.org>, Brendan
>> Miller wrote:
>>> It seems that characters not in the ascii subset of UTF-8 are
>>> discarded by c_char_p during the conversion ...
>> Not a chance.
>>> ... or at least they don't print out when I go to print the string.
>> So it seems there’s a problem on the printing side. What happens when you
>> construct a UTF-8-encoded string directly in Python and try printing it the
>> same way?
> Doing this seems to confirm something is broken in ctypes w.r.t. UTF-8...
> if I enter:
> str = "日本語のテスト"
> print str
> However, when I create a string buffer, pass it into my c++ code, and
> write the same UTF-8 string into it, python seems to discard pretty
> much all the text. The same code works for pure ascii strings.
> Python code:
> _std_string_size = _lib_mbxclient.std_string_size
> _std_string_size.restype = c_long
> _std_string_size.argtypes = [c_void_p]
> _std_string_copy = _lib_mbxclient.std_string_copy
> _std_string_copy.restype = None
> _std_string_copy.argtypes = [c_void_p, POINTER(c_char)]
> # This function works for ascii, but breaks on strings with UTF-8!
> def std_string_to_string(str_ptr):
> buf = create_string_buffer(_std_string_size(str_ptr))
> _std_string_copy(str_ptr, buf)
> return buf.raw
> C++ code:
> extern "C"
> long std_string_size(string* str)
> return str->size();
> extern "C"
> void std_string_copy(string* str, char* buf)
> std::copy(str->begin(), str->end(), buf);
It might have something to do with the character encoding of your
Also, try printing out the character codes of the string and the size
of the string's character in the C++ code.
More information about the Python-list