[Python-Dev] Unicode support in getargs.c
Jack Jansen
Jack.Jansen@cwi.nl
Tue, 1 Jan 2002 23:54:10 +0100
I posted a question on Unicode support in getargs.c last month (working
on a different project), but now that I'm trying to support
unicode-based APIs more seriously I find that it leaves even more to be
desired. I'd like to help to fix this, but I need some direction on
how things should be fixed.
Here are some of the issues I ran in today:
- Unicode objects have a companion string object, meaning that you can
pass a unicode object to an "s" format and have the right thing happen.
String objects have no such accompanying unicode object, and I think they
should have. Right now you cannot pass a string object when the C
routine expects a unicode object.
- There is no unicode equivalent of "c", the single character.
- "u#" does something useful, but something completely different from
what "s#" does. More to the point, it probably does something
dangerous, if I understand correctly. If I write a C routine with an
"u#" format and the Python code passes a string object the string object
will be used as a buffer object and its binary contents will be interpreted
as unicode. If the argument in question is a filename this will produce
very surprising results:-)
I'd like unicode objects to be get a little more first class citizenship,
especially in the light of operating systems that are primarily (or
exclusively) unicode based, such as Mac OS X or Windows CE, to sum things up.