Unicode BOM marks
Francis Girard
francis.girard at free.fr
Tue Mar 8 04:01:00 EST 2005
Hi,
Thank you for your answer. That confirms what Martin v. Löwis says. You can
choose between UCS-2 or UCS-4 for internal unicode representation.
Francis Girard
Le mardi 8 Mars 2005 00:44, Jeff Epler a écrit :
> On Mon, Mar 07, 2005 at 11:56:57PM +0100, Francis Girard wrote:
> > BTW, the python "unicode" built-in function documentation says it returns
> > a "unicode" string which scarcely means something. What is the python
> > "internal" unicode encoding ?
>
> The language reference says farily little about unicode objects. Here's
> what it does say: [http://docs.python.org/ref/types.html#l2h-48]
> Unicode
> The items of a Unicode object are Unicode code units. A Unicode
> code unit is represented by a Unicode object of one item and can
> hold either a 16-bit or 32-bit value representing a Unicode
> ordinal (the maximum value for the ordinal is given in
> sys.maxunicode, and depends on how Python is configured at
> compile time). Surrogate pairs may be present in the Unicode
> object, and will be reported as two separate items. The built-in
> functions unichr() and ord() convert between code units and
> nonnegative integers representing the Unicode ordinals as
> defined in the Unicode Standard 3.0. Conversion from and to
> other encodings are possible through the Unicode method encode
> and the built-in function unicode().
>
> In terms of the CPython implementation, the PyUnicodeObject is laid out
> as follows:
> typedef struct {
> PyObject_HEAD
> int length; /* Length of raw Unicode data in buffer
> */ Py_UNICODE *str; /* Raw Unicode buffer */
> long hash; /* Hash value; -1 if not set */
> PyObject *defenc; /* (Default) Encoded version as Python
> string, or NULL; this is used for
> implementing the buffer protocol */
> } PyUnicodeObject;
> Py_UNICODE is some "C" integral type that can hold values up to
> sys.maxunicode (probably one of unsigned short, unsigned int, unsigned
> long, wchar_t).
>
> Jeff
More information about the Python-list
mailing list