[Python-Dev] Unicode and Windows

M.-A. Lemburg mal@lemburg.com
Fri, 24 Mar 2000 12:29:12 +0100

Greg Stein wrote:
> On Fri, 24 Mar 2000, M.-A. Lemburg wrote:
> >...
> >   "s":  For Unicode objects: auto convert them to the <default encoding>
> >         and return a pointer to the object's <defencstr> buffer.
> Guess that I didn't notice this before, but it seems wierd that "s" and
> "s#" return different encodings.
> Why?

This is due to the buffer interface being used for "s#". Since
"s#" refers to the getreadbuf slot, it returns raw data. In
this case this is UTF-16 in platform dependent byte order.

"s" relies on NULL-terminated strings and doesn't use the
buffer interface at all. Thus "s" returns NULL-terminated
UTF-8 (UTF-16 is full of NULLs).
"t#" uses the getcharbuf slot and thus should return character
data. UTF-8 is the right encoding here.

> >   "es":
> >       Takes two parameters: encoding (const char **) and
> >       buffer (char **).
> >...
> >   "es#":
> >       Takes three parameters: encoding (const char **),
> >       buffer (char **) and buffer_len (int *).
> I see no reason to make the encoding (const char **) rather than
> (const char *). We are never returning a value, so this just makes it
> harder to pass the encoding into ParseTuple.
> There is precedent for passing in single-ref pointers. For example:
>   PyArg_ParseTuple(args, "O!", &s, PyString_Type)
> I would recommend using just one pointer level for the encoding.

You have a point there... even though it breaks the notion
of prepending all parameters with an '&' (ok, except the
type check one). OTOH, it would allow passing the encoding
right with the PyArg_ParseTuple() call which probably makes
more sense in this context.

I'll change it...

Marc-Andre Lemburg
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/