utf-8 and ctypes

MRAB python at mrabarnett.plus.com
Wed Sep 29 18:34:22 EDT 2010


On 29/09/2010 19:33, Brendan Miller wrote:
> 2010/9/29 Lawrence D'Oliveiro<ldo at geek-central.gen.new_zealand>:
>> In message<mailman.1132.1285714474.29448.python-list at python.org>, Brendan
>> Miller wrote:
>>
>>> It seems that characters not in the ascii subset of UTF-8 are
>>> discarded by c_char_p during the conversion ...
>>
>> Not a chance.
>>
>>> ... or at least they don't print out when I go to print the string.
>>
>> So it seems there’s a problem on the printing side. What happens when you
>> construct a UTF-8-encoded string directly in Python and try printing it the
>> same way?
> 
> Doing this seems to confirm something is broken in ctypes w.r.t. UTF-8...
> 
> if I enter:
> str = "日本語のテスト"
> 
> Then:
> print str
> 日本語のテスト
> 
> However, when I create a string buffer, pass it into my c++ code, and
> write the same UTF-8 string into it, python seems to discard pretty
> much all the text. The same code works for pure ascii strings.
> 
> Python code:
> _std_string_size = _lib_mbxclient.std_string_size
> _std_string_size.restype = c_long
> _std_string_size.argtypes = [c_void_p]
> 
> _std_string_copy = _lib_mbxclient.std_string_copy
> _std_string_copy.restype = None
> _std_string_copy.argtypes = [c_void_p, POINTER(c_char)]
> 
> # This function works for ascii, but breaks on strings with UTF-8!
> def std_string_to_string(str_ptr):
>      buf = create_string_buffer(_std_string_size(str_ptr))
>      _std_string_copy(str_ptr, buf)
>      return buf.raw
> 
> C++ code:
> 
> extern "C"
> long std_string_size(string* str)
> {
> 	return str->size();
> }
> 
> extern "C"
> void std_string_copy(string* str, char* buf)
> {
> 	std::copy(str->begin(), str->end(), buf);
> }

It might have something to do with the character encoding of your
source files.

Also, try printing out the character codes of the string and the size
of the string's character in the C++ code.



More information about the Python-list mailing list