[Python-Dev] getting the UCS-2 representation of a unicode object

Martin v. Loewis martin@v.loewis.de
19 May 2002 21:03:12 +0200


"Andreas Jung" <andreas@andreas-jung.com> writes:

> Sounds reasonable..but since Py_ParseTuple() only applies to
> function arguments it can not be used to convert a unicode object to
> UCS-2. So what is the easiest way to get the UCS-2 representation?
> PyUnicode_AS_DATA() returns for u'computer' a char * with
> strlen()==1, however PyUnicode_GET_DATA_SIZE() on the same string
> returns 16 (looks fine for the two byes encoding of UCS-2). Am I
> missing something?

As Fredrik explains, you are getting what I believe you mean by
"UCS-2" - you get the internal representation, which, in your build,
most likely uses unsigned short as Py_UNICODE.

If you are really interested in UCS-2 data, you need to use
PyUnicode_EncodeUTF16, since the internal representation, when
interpreted as a byte sequence, may or may not be UCS-2.

Regards,
Martin