[Python-3000] string C API
"Martin v. Löwis"
martin at v.loewis.de
Sat Sep 16 15:49:29 CEST 2006
Nick Coghlan schrieb:
> The choice of latin-1 is deliberate and non-arbitrary. The reason for the
> choice is that the ordinals 0-255 in latin-1 map to the Unicode code points 0-255:
That's true, but that this makes a good choice for a special case
doesn't follow. Instead, frequency of occurrence of the special case
makes it a good choice.
> In effect, when creating the string, you would be doing something like this:
>
> if encoding == 'latin-1':
> bytes_per_char = 1
> code_points = 8_bit_data
> else:
> code_points, max_code_point = decode_to_UCS4(8_bit_data, encoding)
> if max_code_point < 256:
> bytes_per_char = 1
> elif max_code_point < 65536:
> bytes_per_char = 2
> else:
> bytes_per_char = 4
Hardly. Instead, the codec would have to create the string of the right
width; a codec written in C would make two passes, rather than
temporarily allocating memory to actually represent the UCS-4 codes.
Regards,
Martin
More information about the Python-3000
mailing list