no struct.pack for unicode strings?
Radovan Garabik
spam at melkor.dnp.fmph.uniba.sk
Fri Oct 18 04:13:13 EDT 2002
Martin v. Loewis <martin at v.loewis.de> wrote:
: Radovan Garabik <spam at melkor.dnp.fmph.uniba.sk> writes:
:> Why there is no format to pack/unpack unicode strings?
:> Or am I missing something?
: It's not entirely clear what struct.pack should do with a Unicode
: object: UTF-8, UTF-16 (big or little endian, with or without BOM),
: UTF-32 (big or little endian, with or without BOM), system encoding,
: ...
it should pack them as raw Py_UNICODE data. At least that is what
I'd need
: Hence, no packing is provided.
:> My application needs to struct.pack unicode strings, to save them into a
:> file which can be then read by a C extension module where I need to
:> access characters of the string (as Py_UNICODE).
: To save Unicode in a file, I recommend to encode them as UTF-8, and
: use PyUnicode_DecodeUTF8 in your extension module to restore the
: Unicode object.
This is exactly what I am trying to avoid, since I need to quickly loop
over the strings (it is a dictionary index) written in the file - hence
the C extension module.
I am afraid that using PyUnicode_DecodeUTF8 (or anything that creates a
PyObject) would impose a big speed penalty.
--
-----------------------------------------------------------
| Radovan GarabĂk http://melkor.dnp.fmph.uniba.sk/~garabik |
| __..--^^^--..__ garabik @ fmph . uniba . sk |
-----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!
More information about the Python-list
mailing list