[Python-Dev] Unicode support in getargs.c

Martin v. Loewis martin@v.loewis.de
Wed, 2 Jan 2002 23:51:17 +0100


> I have a number of MacOSX API's that expect Unicode buffers, passed as 
> "long count, UniChar *buffer". 

Well, my first question would be: Are you sure that UniChar has the
same underlying integral type as Py_UNICODE? If not, you lose.

So you may need to do even more conversion.

> I have the machinery in bgen to generate code for this, iff "u#" (or
> something else) would work the same as "s#", i.e. it returns you a
> pointer and a size, and it would work equally well for unicode
> objects as for classic strings (after conversion).

I see. u# could be made work for Unicode objects alone, but it would
have to reject string objects.

> But as a general solution it doesn't look right: "How do I call a C 
> routine with a string parameter?" "Use the "s" format and you get the 
> string pointer to pass". "How do I call a C routine with a unicode string 
> parameter?" 

For that, the answer is u. But you want the length also. So for that,
the answer is u#. But your question is "How do I call a C routine with
either a Unicode object or a string object, getting a reasonable
Py_UNICODE* and the length?".

For that, I'd recommend to use O&, with a conversion function

PyObject *Py_UnicodeOrString(PyObject *o, void *ignored)){
  if (PyUnicode_Check(o)){
    Py_INCREF(o);return o;
  }
  if (PyString_Check(o)){
    return PyUnicode_FromObject(o);
  }
  PyErr_SetString(PyExc_TypeError,"unicode object expecpected");
  return NULL;
}

> "Use O and PyUnicode_FromObject() and PyUnicode_AsUnicode and 
> make sure you get all your decrefs right and.....".

With the function above, this becomes

Use O&, passing a PyObject**, the function, and a NULL pointer, using
PyUnicode_AS_UNICODE and PyUnicode_SIZE, performing a single DECREF at
the end [allowing to specify an encoding is optional]

In this scenario, somebody *has* to deallocate memory, you cannot get
around this. It is your choice whether this is Py_DECREF or PyMem_Free
that you have to call (as with the "esomething" conversions); the
DECREF is more efficient as it will not copy a Unicode object.

> The "es#" is a very strange beast, and a similar "eu#" would help me a 
> little, but it has some serious drawbacks. Aside from it being completely 
> different from the other converters (being a prefix operator in stead of a 
> postfix one, and having a value-return argument) I would also have to 
> pre-allocate the buffer in advance, and that sort of defeats the purpose.

You don't. If you set the buffer to NULL before invoking getargs, you
have to PyMem_Free it afterwards.

Regards,
Martin