[I18n-sig] Re: gettext in the standard library

Martin von Loewis loewis@informatik.hu-berlin.de
Mon, 4 Sep 2000 20:01:48 +0200 (MET DST)

> Problems would arise if the source strings were recoded, between string
> extraction by POT tools, and string usage for translation at run-time.
> Python will likely "internalise" or convert Unicode strings from UTF-8,
> and this is a change of representation.

Currently, to put Unicode strings into source code, you'll have to use
\u escapes in your source(e.g. print u"\u263A"). I'm not aware of any
editor that transparently displays these beasts.

So if you want to have non-English msgid strings using the Unicode
standard (rather than Unicode objects), your best bet is probably to
encode the Python source as UTF-8. As a result, you'll use byte
strings as parameters to _, which is supported well by the API.

[As a side note: I would have preferred if u"" strings had UTF-8
 inside them. As it is, I doubt anybody will use them for things
 other than WHITE SMILING FACE].

With byte strings, Python won't do any internalisation, so at run
time, you'll always have the same byte string that you got at
extraction time.