[Python-Dev] Unicode and Windows
M.-A. Lemburg
mal@lemburg.com
Fri, 24 Mar 2000 11:37:53 +0100
Ok, I've just added two new parser markers to PyArg_ParseTuple()
which will hopefully make life a little easier for extension
writers.
The new code will be in the next patch set which I will release
early next week.
Here are the docs:
Internal Argument Parsing:
--------------------------
These markers are used by the PyArg_ParseTuple() APIs:
"U": Check for Unicode object and return a pointer to it
"s": For Unicode objects: auto convert them to the <default encoding>
and return a pointer to the object's <defencstr> buffer.
"s#": Access to the Unicode object via the bf_getreadbuf buffer interface
(see Buffer Interface); note that the length relates to the buffer
length, not the Unicode string length (this may be different
depending on the Internal Format).
"t#": Access to the Unicode object via the bf_getcharbuf buffer interface
(see Buffer Interface); note that the length relates to the buffer
length, not necessarily to the Unicode string length (this may
be different depending on the <default encoding>).
"es":
Takes two parameters: encoding (const char **) and
buffer (char **).
The input object is first coerced to Unicode in the usual way
and then encoded into a string using the given encoding.
On output, a buffer of the needed size is allocated and
returned through *buffer as NULL-terminated string.
The encoded may not contain embedded NULL characters.
The caller is responsible for free()ing the allocated *buffer
after usage.
"es#":
Takes three parameters: encoding (const char **),
buffer (char **) and buffer_len (int *).
The input object is first coerced to Unicode in the usual way
and then encoded into a string using the given encoding.
If *buffer is non-NULL, *buffer_len must be set to sizeof(buffer)
on input. Output is then copied to *buffer.
If *buffer is NULL, a buffer of the needed size is
allocated and output copied into it. *buffer is then
updated to point to the allocated memory area. The caller
is responsible for free()ing *buffer after usage.
In both cases *buffer_len is updated to the number of
characters written (excluding the trailing NULL-byte).
The output buffer is assured to be NULL-terminated.
Examples:
Using "es#" with auto-allocation:
static PyObject *
test_parser(PyObject *self,
PyObject *args)
{
PyObject *str;
const char *encoding = "latin-1";
char *buffer = NULL;
int buffer_len = 0;
if (!PyArg_ParseTuple(args, "es#:test_parser",
&encoding, &buffer, &buffer_len))
return NULL;
if (!buffer) {
PyErr_SetString(PyExc_SystemError,
"buffer is NULL");
return NULL;
}
str = PyString_FromStringAndSize(buffer, buffer_len);
free(buffer);
return str;
}
Using "es" with auto-allocation returning a NULL-terminated string:
static PyObject *
test_parser(PyObject *self,
PyObject *args)
{
PyObject *str;
const char *encoding = "latin-1";
char *buffer = NULL;
if (!PyArg_ParseTuple(args, "es:test_parser",
&encoding, &buffer))
return NULL;
if (!buffer) {
PyErr_SetString(PyExc_SystemError,
"buffer is NULL");
return NULL;
}
str = PyString_FromString(buffer);
free(buffer);
return str;
}
Using "es#" with a pre-allocated buffer:
static PyObject *
test_parser(PyObject *self,
PyObject *args)
{
PyObject *str;
const char *encoding = "latin-1";
char _buffer[10];
char *buffer = _buffer;
int buffer_len = sizeof(_buffer);
if (!PyArg_ParseTuple(args, "es#:test_parser",
&encoding, &buffer, &buffer_len))
return NULL;
if (!buffer) {
PyErr_SetString(PyExc_SystemError,
"buffer is NULL");
return NULL;
}
str = PyString_FromStringAndSize(buffer, buffer_len);
return str;
}
--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/