[Python-Dev] Unicode support in getargs.c
Martin v. Loewis
martin@v.loewis.de
Wed, 2 Jan 2002 23:51:17 +0100
> I have a number of MacOSX API's that expect Unicode buffers, passed as
> "long count, UniChar *buffer".
Well, my first question would be: Are you sure that UniChar has the
same underlying integral type as Py_UNICODE? If not, you lose.
So you may need to do even more conversion.
> I have the machinery in bgen to generate code for this, iff "u#" (or
> something else) would work the same as "s#", i.e. it returns you a
> pointer and a size, and it would work equally well for unicode
> objects as for classic strings (after conversion).
I see. u# could be made work for Unicode objects alone, but it would
have to reject string objects.
> But as a general solution it doesn't look right: "How do I call a C
> routine with a string parameter?" "Use the "s" format and you get the
> string pointer to pass". "How do I call a C routine with a unicode string
> parameter?"
For that, the answer is u. But you want the length also. So for that,
the answer is u#. But your question is "How do I call a C routine with
either a Unicode object or a string object, getting a reasonable
Py_UNICODE* and the length?".
For that, I'd recommend to use O&, with a conversion function
PyObject *Py_UnicodeOrString(PyObject *o, void *ignored)){
if (PyUnicode_Check(o)){
Py_INCREF(o);return o;
}
if (PyString_Check(o)){
return PyUnicode_FromObject(o);
}
PyErr_SetString(PyExc_TypeError,"unicode object expecpected");
return NULL;
}
> "Use O and PyUnicode_FromObject() and PyUnicode_AsUnicode and
> make sure you get all your decrefs right and.....".
With the function above, this becomes
Use O&, passing a PyObject**, the function, and a NULL pointer, using
PyUnicode_AS_UNICODE and PyUnicode_SIZE, performing a single DECREF at
the end [allowing to specify an encoding is optional]
In this scenario, somebody *has* to deallocate memory, you cannot get
around this. It is your choice whether this is Py_DECREF or PyMem_Free
that you have to call (as with the "esomething" conversions); the
DECREF is more efficient as it will not copy a Unicode object.
> The "es#" is a very strange beast, and a similar "eu#" would help me a
> little, but it has some serious drawbacks. Aside from it being completely
> different from the other converters (being a prefix operator in stead of a
> postfix one, and having a value-return argument) I would also have to
> pre-allocate the buffer in advance, and that sort of defeats the purpose.
You don't. If you set the buffer to NULL before invoking getargs, you
have to PyMem_Free it afterwards.
Regards,
Martin