[Python-3000] string C API

Antoine Pitrou solipsis at pitrou.net
Fri Sep 15 20:04:33 CEST 2006


Le vendredi 15 septembre 2006 à 10:48 -0700, Josiah Carlson a écrit :
> This is one of the reasons why I was talking Latin-1, UCS-2, and UCS-4:

You could replace "latin-1" with "one-byte system encoding chosen at
interpreter startup depending on locale".
There are lots of 8-bit encodings other than iso-8859-1.
(for example, my current locale uses iso-8859-15)

The algorithm for choosing the one-byte encoding could be:
- if the current locale uses an one-byte encoding, use that encoding
- otherwise, if current locale language has a popular one-byte encoding
(for many languages this would mean iso-8859-<X>), use that encoding
- otherwise, no one-byte encoding

This would ensure that, for example, Russian text on a system configured
with a Russian locale does not always end up using two bytes per
character internally.

Regards

Antoine.




More information about the Python-3000 mailing list