[Python-Dev] Divorcing str and unicode (no more implicit conversions).
"Martin v. Löwis"
martin at v.loewis.de
Tue Oct 25 00:59:27 CEST 2005
Antoine Pitrou wrote:
>>There are many design alternatives:
>
> Wouldn't it be simpler to use:
> - one-byte representation if every character <= 0xFF
> - two-byte representation if every character <= 0xFFFF
> - four-byte representation otherwise
As I said: there are many alternatives. This one has the
disadvantage of requiring a copy every time you pass the string
to a Win32 function (which expects UTF-16).
Whether or not this is a significant disadvantage, I don't know.
In any case, a multi-representations implementation has the
disadvantage of making the C API more difficult to use, in
particular for writing codecs. On encoding, it is difficult
to fetch the individual characters which you need for the
lookup table; on decoding, it is difficult to know in advance
what representation to use (unless you know there is an upper
bound on the decoded character ordinals).
Regards,
Martin
More information about the Python-Dev
mailing list