Unicode problem in ucs4
John Machin
sjmachin at lexicon.net
Mon Mar 23 03:41:10 EDT 2009
On Mar 23, 6:18 pm, abhi <abhigyan_agra... at in.ibm.com> wrote:
[snip]
> Hi Mark,
> Thanks for the help. I tried PyUnicode_AsWideChar() but I am
> getting the same result i.e. only the first letter.
>
> sample code:
>
> #include<Python.h>
>
> static PyObject *unicode_helper(PyObject *self,PyObject *args){
> PyObject *sampleObj = NULL;
> wchar_t *sample = NULL;
> int size = 0;
>
> if (!PyArg_ParseTuple(args, "O", &sampleObj)){
> return NULL;
> }
>
> // use wide char function
> size = PyUnicode_AsWideChar(databaseObj, sample,
> PyUnicode_GetSize(databaseObj));
What is databaseObj??? Copy/paste the *actual* code that you compiled
and ran.
> printf("%d chars are copied to sample\n", size);
> wprintf(L"database value after unicode conversion is : %s\n",
> sample);
> return Py_BuildValue("");
>
> }
>
> static PyMethodDef funcs[]={{"unicodeTest",(PyCFunction)
> unicode_helper,METH_VARARGS,"test ucs2, ucs4"},{NULL}};
>
> void initunicodeTest(void){
> Py_InitModule3("unicodeTest",funcs,"");
>
> }
>
> This prints the following when input value is given as "test":
> 4 chars are copied to sample
> database value after unicode conversion is : t
[presuming littleendian] The ucs4 string will look like "\t\0\0\0e
\0\0\0s\0\0\0t\0\0\0" in memory. I suspect that your wprintf is
grokking only 16-bit doodads -- "\t\0" is printed and then "\0\0" is
end-of-string. Try your wprintf on sample[0], ..., sample[3] in a loop
and see what you get. Use bog-standard printf to print the hex
representation of each of the 16 bytes starting at the address sample
is pointing to.
More information about the Python-list
mailing list