[Python-3000] string C API
Antoine Pitrou
solipsis at pitrou.net
Fri Sep 15 20:04:33 CEST 2006
Le vendredi 15 septembre 2006 à 10:48 -0700, Josiah Carlson a écrit :
> This is one of the reasons why I was talking Latin-1, UCS-2, and UCS-4:
You could replace "latin-1" with "one-byte system encoding chosen at
interpreter startup depending on locale".
There are lots of 8-bit encodings other than iso-8859-1.
(for example, my current locale uses iso-8859-15)
The algorithm for choosing the one-byte encoding could be:
- if the current locale uses an one-byte encoding, use that encoding
- otherwise, if current locale language has a popular one-byte encoding
(for many languages this would mean iso-8859-<X>), use that encoding
- otherwise, no one-byte encoding
This would ensure that, for example, Russian text on a system configured
with a Russian locale does not always end up using two bytes per
character internally.
Regards
Antoine.
More information about the Python-3000
mailing list