i18n and german umlauts
Martin von Loewis
loewis at informatik.hu-berlin.de
Sat Dec 15 15:44:34 CET 2001
"Eike Kock" <ekock at movatis.com> writes:
> I just got started with Python and internationalization using the
> gettext module, pygettext.py and msgfmt.py. Everything works fine
> except when it comes to special characters like german umlauts. Has
> anyone mastered this in a plattform independent way?
What problem are you facing? In the original source code, you should
only put English messages - that will ease translation, since it will
be easier to find a translator speaking English than finding, say,
someone who speaks German as a foreign language.
In the message catalogs, use whatever encoding you like. Make sure you
put a GNU style PO header into each translation, like
"Project-Id-Version: GNU grep 2.5e\n"
"POT-Creation-Date: 2001-03-07 00:02-0500\n"
"PO-Revision-Date: 2001-06-09 11:14+02:00\n"
"Last-Translator: Martin von Löwis <martin at mira.isdn.cs.tu-berlin.de>\n"
"Language-Team: German <de at li.org>\n"
"Content-Type: text/plain; charset=ISO-8859-1\n"
That allows to keep track of the catalog's charset, so different
catalogs can use different charsets.
At run-time, you have two options:
- get the translation from the catalog as a byte string, in the
encoding of the catalog. This is what gettext.gettext will do. It
is the best approach if you print messages to the user's terminal,
since that likely uses the same encoding that the translator was
- get the translation from the catalog as a Unicode string, by
means of translation.ugettext (where the translation object
is obtained through gettext.translation(domain)). This only works
if the encoding of the catalog was declared in the catalog.
This approach is best in the following cases:
- you output the messages to a GUI toolkit that supports
Unicode (like Tkinter).
- you output the messages to a file with a well-known encoding
(like HTML). In this case, you will need to convert the Unicode
string to the encoding of the output file.
- you output the messages to a terminal, and you can find out
what the encoding of the terminal is (eg. through
locale.nl_langinfo(locale.CODESET), of Py 2.2).
P.S. Thanks for confirming that you do use pygettext.py and
msgfmt.py. Please notice that, as an option, you could also use GNU
gettext for these tasks - they are likely faster than the pure-Python
More information about the Python-list