[Python-Dev] unicode/string asymmetries

Martin v. Loewis martin@v.loewis.de
Tue, 8 Jan 2002 21:24:57 +0100


> I noticed several unicode/string asymmetries:
> 
> 1. No support for unicode in the struct and array modules.
> Is this an oversight?

I'd call it intentional. What exactly would you like to happen?

> 2. What would be the corresponding unicode format character for 'z'
> in the struct module (string or None)?

You mean, in getargs? There is no corresponding thing.

I'd recommend against adding new formats. Instead, I'd propose to add
new conversion functions:

  Py_UNICODE *str;
  PyArg_ParseTuple(args, "O&", &str, PyArg_UnicodeZ);

int PyArg_UnicodeZ(PyObject *o, void *d){
  PyUnicode **dest = (Py_UNICODE**)d;
  if (o == Py_None) {
    *dest = NULL;
    return 1;
  }
  if (PyUnicode_Check(o)){
    *dest = PyUnicode_AS_UNICODE(o);
    return 1;
  }
  PyErr_SetString(PyExc_TypeError, "unicode or None expected");
  return 0;
}

It may be desirable to allow passing of : or ; strings to conversion
functions, and helper API to format the errors.

> 3. There does not seem to be an equivalent to the 's' format character
> for PyArg_Parse() or Py_BuildValue().

That would be 'u'. However, is this really needed? PyArg_Parse is
deprecated, and I doubt you have Py_UNICODE* often enough to need
it to pass to Py_BuildValue.

Regards,
Martin