utf-8 and ctypes

Diez B. Roggisch deets at web.de
Thu Sep 30 04:59:28 EDT 2010


Brendan Miller <catphive at catphive.net> writes:

> 2010/9/29 Lawrence D'Oliveiro <ldo at geek-central.gen.new_zealand>:
>> In message <mailman.1132.1285714474.29448.python-list at python.org>, Brendan
>> Miller wrote:
>>
>>> It seems that characters not in the ascii subset of UTF-8 are
>>> discarded by c_char_p during the conversion ...
>>
>> Not a chance.
>>
>>> ... or at least they don't print out when I go to print the string.
>>
>> So it seems there’s a problem on the printing side. What happens when you
>> construct a UTF-8-encoded string directly in Python and try printing it the
>> same way?
>
> Doing this seems to confirm something is broken in ctypes w.r.t. UTF-8...
>
> if I enter:
> str = "日本語のテスト"

What is this? Which encoding is used by your editor to produce this
byte-string?

If you want to be sure you have the right encoding, you need to do this:

 - put a coding: utf-8 (or actually whatever your editor uses) in the
   first or second line
 - use unicode literals. That are the funny little strings with a "u" in
   front of them. They will be *decoded* using the declared encoding.
 - when passing this to C, explicitly *encode* with utf-8 first.

Diez



More information about the Python-list mailing list