[Tutor] how to struct.pack a unicode string?
eryksun
eryksun at gmail.com
Sat Dec 1 02:28:24 CET 2012
On Fri, Nov 30, 2012 at 11:43 AM, Albert-Jan Roskam <fomcl at yahoo.com> wrote:
>
> How can I pack a unicode string using the struct module?
struct.pack is for packing an arbitrary sequence of data into a C-like
struct. You have to manually add pad bytes. Alternatively you can use
a ctypes.Structure.
The struct module supports plain byte strings, not Unicode. UTF-8 was
designed to encode all of Unicode in a way that can seamlessly pass
through libraries that process C strings (i.e. an array of non-null
bytes terminated by a null byte). Byte values less than 128 are ASCII;
beyond ASCII, UTF-8 uses 2-4 bytes, and all byte values are greater
than 127, with standardized byte order. In contrast, UTF-16 and UTF-32
have null bytes in the string and platform-determined byte order. The
length and order of the optional byte order mark (BOM) distinguishes
UTF-16LE, UTF-16BE, UTF-32LE, and UTF-32BE. There's also a UTF-8 BOM
used on Windows. Python calls this encoding "utf-8-sig".
> fmt = endianness + str(len(hello)) + "s"
That's the wrong length. Use the length of the encoded string.
More information about the Tutor
mailing list