International characters and Python's string functions on Linux
andy at robanal.demon.co.uk
Tue Aug 10 21:54:52 CEST 1999
>for documented sources.
> In Python today, you can support
>internationalisation using the UTF-8 encoding, as my
>literate programming tool interscript does. UTF-8 is,
>in my opinion, the best option in Python today, since it
>is ASCII compatible, and will work with 8 bit strings as
>Python today already has.
Internationalisation is not just about which encoding you use, it is
about having the libraries in your app to convert between them as
I use Python to build live gateways between Sybase servers holding
7000+ Japanese names and addresses in UTF8, the same data on AS400s in
IBM's own undocumented encoding, and Shift-JIS on Windows. We have to
handle all three, as those are the systems we have to interface to.
We also have to encode things the way our printers like for printing.
If you have a saner problem to deal with (European languages and one
OS), the key issues are still what encodings do you get data from,
what encodings do you have to save it to, and do you have to print it?
The choice of encoding is never made in isolation.
John Skaller's conversion utilities are a great start - I had
something uncannily similar but it was written for clients and I
cannot release it. What Python really needs to handle
internationalisation well is not an even better wide string type, but
a standard library to convert all those other encodings to and from
UCS2/UTF8. Java does this out of the box.
More information about the Python-list