utf-8 and ctypes
Diez B. Roggisch
deets at web.de
Thu Sep 30 10:59:28 CEST 2010
Brendan Miller <catphive at catphive.net> writes:
> 2010/9/29 Lawrence D'Oliveiro <ldo at geek-central.gen.new_zealand>:
>> In message <mailman.1132.1285714474.29448.python-list at python.org>, Brendan
>> Miller wrote:
>>> It seems that characters not in the ascii subset of UTF-8 are
>>> discarded by c_char_p during the conversion ...
>> Not a chance.
>>> ... or at least they don't print out when I go to print the string.
>> So it seems there’s a problem on the printing side. What happens when you
>> construct a UTF-8-encoded string directly in Python and try printing it the
>> same way?
> Doing this seems to confirm something is broken in ctypes w.r.t. UTF-8...
> if I enter:
> str = "日本語のテスト"
What is this? Which encoding is used by your editor to produce this
If you want to be sure you have the right encoding, you need to do this:
- put a coding: utf-8 (or actually whatever your editor uses) in the
first or second line
- use unicode literals. That are the funny little strings with a "u" in
front of them. They will be *decoded* using the declared encoding.
- when passing this to C, explicitly *encode* with utf-8 first.
More information about the Python-list