[Python-Dev] Small issues in gettext support
Gustavo Niemeyer
niemeyer at conectiva.com
Sun Apr 25 19:40:19 EDT 2004
Hello folks,
I've been working with gettext support in Python, and found some
issues I'd like to discuss with you.
First, I've noticed that there's a difference in the internal
implementation of gettext and GNU gettext regarding the returned
encoding on non-unicode strings. Notice the difference in the
result of this code:
import gettext
import locale
locale.setlocale(locale.LC_ALL, "")
locale.textdomain("apt-cdrom-registry")
gettext.textdomain("apt-cdrom-registry")
print locale.gettext("Choose the available CDROMs from the list below")
print gettext.gettext("Choose the available CDROMs from the list below")
This has shown the following:
Escolha os CDROMs disponíves na lista abaixo
Escolha os CDROMs disponÃves na lista abaixo
The reason for this difference is clear: GNU gettext defaults to the
current locale when returning encoded strings, while gettext.py
returns strings in the encoding used in the .mo file. The fix is
simply changing the following code
# Encode the Unicode tmsg back to an 8-bit string, if possible
if self._charset:
return tmsg.encode(self._charset)
to use the system encoding (sys.getdefaultencoding()) instead of
self._charset.
Regarding a similar issue, I've also noticed that we're currently
missing bind_textdomain_codeset() support. This function changes
the codeset used to return the translated strings.
So, I'd like to implement the following changes:
- Change the default codeset used by gettext.py in functions
returning an encoded string to match the system encoding.
- Introduce bind_textdomain_codeset() in locale.
- Introduce bind_textdomain_codeset() in gettext.py implementing
an equivalent functionality.
Comments?
--
Gustavo Niemeyer
http://niemeyer.net
More information about the Python-Dev
mailing list