[Python-3000] string C API

"Martin v. Löwis" martin at v.loewis.de
Sat Sep 16 15:55:47 CEST 2006

Marcin 'Qrczak' Kowalczyk schrieb:
>> You could play tricks with ob_size to save this field:
>> - ob_size < 0: 8-bit data; length is abs(ob_size)
>> - ob_size > 0, (ob_size & 1)==0: 16-bit data, length is ob_size/2
>> - ob_size > 0, (ob_size & 1)==1: 32-bit data, length is ob_size/2
> I wonder whether strings with characters outside ISO-8859-1 are common
> enough that having a 16-bit representation is worth the trouble.
> CLISP does have it. My language doesn't.

The design of Unicode is so that all "living" scripts are encoded with
the BMP. So four-byte characters would be extremely rare, and one may
argue that encoding them with UTF-16 is good enough.

So if there is flexibility in the internal representation of strings,
I think a two-byte representation should definitely be one of the
options; I'd rather debate about the necessity of one-byte and
four-byte representations.


More information about the Python-3000 mailing list