i18n and german umlauts

Sat Dec 15 09:44:34 EST 2001

"Eike Kock" <ekock at movatis.com> writes:

> I just got started with Python and internationalization using the
> gettext module, pygettext.py and msgfmt.py. Everything works fine
> except when it comes to special characters like german umlauts. Has
> anyone mastered this in a plattform independent way?

What problem are you facing? In the original source code, you should
only put English messages - that will ease translation, since it will
be easier to find a translator speaking English than finding, say,
someone who speaks German as a foreign language.

In the message catalogs, use whatever encoding you like. Make sure you
put a GNU style PO header into each translation, like

msgid ""
msgstr ""
"Project-Id-Version: GNU grep 2.5e\n"
"POT-Creation-Date: 2001-03-07 00:02-0500\n"
"PO-Revision-Date: 2001-06-09 11:14+02:00\n"
"Last-Translator: Martin von Löwis <martin at mira.isdn.cs.tu-berlin.de>\n"
"Language-Team: German <de at li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=ISO-8859-1\n"
"Content-Transfer-Encoding: 8-bit\n"

That allows to keep track of the catalog's charset, so different
catalogs can use different charsets.

At run-time, you have two options:

- get the translation from the catalog as a byte string, in the
  encoding of the catalog. This is what gettext.gettext will do.  It
  is the best approach if you print messages to the user's terminal,
  since that likely uses the same encoding that the translator was
  using.

- get the translation from the catalog as a Unicode string, by
  means of translation.ugettext (where the translation object
  is obtained through gettext.translation(domain)). This only works
  if the encoding of the catalog was declared in the catalog.
  This approach is best in the following cases:
  - you output the messages to a GUI toolkit that supports
    Unicode (like Tkinter).
  - you output the messages to a file with a well-known encoding
    (like HTML). In this case, you will need to convert the Unicode
    string to the encoding of the output file.
  - you output the messages to a terminal, and you can find out
    what the encoding of the terminal is (eg. through 
    locale.nl_langinfo(locale.CODESET), of Py 2.2).

HTH,
Martin

P.S. Thanks for confirming that you do use pygettext.py and
msgfmt.py. Please notice that, as an option, you could also use GNU
gettext for these tasks - they are likely faster than the pure-Python
versions.