[Python-3000] string C API
"Martin v. Löwis"
martin at v.loewis.de
Sat Sep 16 15:55:47 CEST 2006
Marcin 'Qrczak' Kowalczyk schrieb:
>> You could play tricks with ob_size to save this field:
>>
>> - ob_size < 0: 8-bit data; length is abs(ob_size)
>> - ob_size > 0, (ob_size & 1)==0: 16-bit data, length is ob_size/2
>> - ob_size > 0, (ob_size & 1)==1: 32-bit data, length is ob_size/2
>
> I wonder whether strings with characters outside ISO-8859-1 are common
> enough that having a 16-bit representation is worth the trouble.
>
> CLISP does have it. My language doesn't.
The design of Unicode is so that all "living" scripts are encoded with
the BMP. So four-byte characters would be extremely rare, and one may
argue that encoding them with UTF-16 is good enough.
So if there is flexibility in the internal representation of strings,
I think a two-byte representation should definitely be one of the
options; I'd rather debate about the necessity of one-byte and
four-byte representations.
Regards,
Martin
More information about the Python-3000
mailing list