[Python-Dev] Unicode support in getargs.c

Jack Jansen Jack.Jansen@cwi.nl
Wed, 2 Jan 2002 22:46:46 +0100 (CET)


On Wed, 2 Jan 2002, Martin v. Loewis wrote:

> > Jack will probably also need a way to say "decode this encoded
> > object into Unicode using the encoding xyz". Something like the
> > Unicode version of "es#". How about "eu#" which then passes through
> > Unicode as-is while decoding all other objects according to the
> > given encoding ?!
> 
> I'd like to see the requirements, in terms of real-world problems,
> before considering any extensions.

I have a number of MacOSX API's that expect Unicode buffers, passed as 
"long count, UniChar *buffer". I have the machinery in bgen to generate 
code for this, iff "u#" (or something else) would work the same as "s#", 
i.e. it returns you a pointer and a size, and it would work equally well 
for unicode objects as for classic strings (after conversion).

The trick with O and using PyUnicode_FromObject() may do the trick for me, 
as my code is generated, so a little more glue call doesn't really matter. 
But as a general solution it doesn't look right: "How do I call a C 
routine with a string parameter?" "Use the "s" format and you get the 
string pointer to pass". "How do I call a C routine with a unicode string 
parameter?" "Use O and PyUnicode_FromObject() and PyUnicode_AsUnicode and 
make sure you get all your decrefs right and.....".

The "es#" is a very strange beast, and a similar "eu#" would help me a 
little, but it has some serious drawbacks. Aside from it being completely 
different from the other converters (being a prefix operator in stead of a 
postfix one, and having a value-return argument) I would also have to 
pre-allocate the buffer in advance, and that sort of defeats the purpose.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@cwi.nl      | ++++ if you agree copy these lines to your sig ++++
http://www.cwi.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm