[Python-Dev] Unicode and Windows
Fri, 24 Mar 2000 12:29:12 +0100
Greg Stein wrote:
> On Fri, 24 Mar 2000, M.-A. Lemburg wrote:
> > "s": For Unicode objects: auto convert them to the <default encoding>
> > and return a pointer to the object's <defencstr> buffer.
> Guess that I didn't notice this before, but it seems wierd that "s" and
> "s#" return different encodings.
This is due to the buffer interface being used for "s#". Since
"s#" refers to the getreadbuf slot, it returns raw data. In
this case this is UTF-16 in platform dependent byte order.
"s" relies on NULL-terminated strings and doesn't use the
buffer interface at all. Thus "s" returns NULL-terminated
UTF-8 (UTF-16 is full of NULLs).
"t#" uses the getcharbuf slot and thus should return character
data. UTF-8 is the right encoding here.
> > "es":
> > Takes two parameters: encoding (const char **) and
> > buffer (char **).
> > "es#":
> > Takes three parameters: encoding (const char **),
> > buffer (char **) and buffer_len (int *).
> I see no reason to make the encoding (const char **) rather than
> (const char *). We are never returning a value, so this just makes it
> harder to pass the encoding into ParseTuple.
> There is precedent for passing in single-ref pointers. For example:
> PyArg_ParseTuple(args, "O!", &s, PyString_Type)
> I would recommend using just one pointer level for the encoding.
You have a point there... even though it breaks the notion
of prepending all parameters with an '&' (ok, except the
type check one). OTOH, it would allow passing the encoding
right with the PyArg_ParseTuple() call which probably makes
more sense in this context.
I'll change it...
Python Pages: http://www.lemburg.com/python/