International characters and Python's string functions on Linux
John Max Skaller
skaller at maxtal.com.au
Thu Aug 5 17:39:07 EDT 1999
On Thu, 5 Aug 1999 15:37:13 +0200, earlybird at mop.no (Alexander Staubo) wrote:
>I'm running Python on Slackware Linux, and it does not seem to provide a
>full international character set in the string.uppercase and
>string.lowercase constants: only A-Z, which isn't enough.
>
>Having studied the string module code a bit, I realize that Python is at
>the mercy of the C API and its string/locale functions, so I'm thinking I
>must probably configure Linux for a specific locale. Right? If so, how?
Wrong. Python doesn't use C locale functions.
Also, it doesn't support ISO-10646 directly. If you wish to support
any internationalisation, the recommended approach is to conform
to ISO-10646, which is an International Standard.
You can write your own code now, and you can borrow
mine: see
http://www.triode.net.au/~skaller/unicode/index.html
for documented sources.
In Python today, you can support
internationalisation using the UTF-8 encoding, as my
literate programming tool interscript does. UTF-8 is,
in my opinion, the best option in Python today, since it
is ASCII compatible, and will work with 8 bit strings as
Python today already has.
The LAST thing you should do is give any
internal support to ISO-8859-x encodings such as Latin-1
etc, which are NOT UTF-8 compatible, nor are
they compatible with each other. In other words
do NOT use any single (or double) byte international
character set encodings. Use UTF-8, or use arrays
of integers representing the 31 bit UCS-4
(direct) encoding of ISO-10646.
By the way, UTF-8 is the official
Linux encoding. Try
man utf-8
and replace the incorrect word 'unicode' with
ISO-10646 in the man page: UTF-8 provides
a full encoding of all 2^31 codepoints.
[Unicode is the first 2^16 of them]
John Max Skaller ph:61-2-96600850
mailto:skaller at maxtal.com.au 10/1 Toxteth Rd
http://www.maxtal.com.au/~skaller Glebe 2037 NSW AUSTRALIA
More information about the Python-list
mailing list