[Python-Dev] Unicode support in getargs.c

Jack Jansen Jack.Jansen@cwi.nl
Tue, 1 Jan 2002 23:54:10 +0100


I posted a question on Unicode support in getargs.c last month (working
on a different project), but now that I'm trying to support
unicode-based APIs more seriously I find that it leaves even more to be
desired. I'd like to help to fix this, but I need some direction on
how things should be fixed.

Here are some of the issues I ran in today:
- Unicode objects have a companion string object, meaning that you can
  pass a unicode object to an "s" format and have the right thing happen.
  String objects have no such accompanying unicode object, and I think they
  should have. Right now you cannot pass a string object when the C
  routine expects a unicode object.
- There is no unicode equivalent of "c", the single character.
- "u#" does something useful, but something completely different from
  what "s#" does. More to the point, it probably does something
  dangerous, if I understand correctly. If I write a C routine with an
  "u#" format and the Python code passes a string object the string object
  will be used as a buffer object and its binary contents will be interpreted
  as unicode. If the argument in question is a filename this will produce
  very surprising results:-)

I'd like unicode objects to be get a little more first class citizenship,
especially in the light of operating systems that are primarily (or
exclusively) unicode based, such as Mac OS X or Windows CE, to sum things up.