[Python-Dev] Small issues in gettext support

Barry Warsaw barry at python.org
Mon Apr 26 10:13:14 EDT 2004


On Sun, 2004-04-25 at 19:40, Gustavo Niemeyer wrote:
> Hello folks,
> 
> I've been working with gettext support in Python, and found some
> issues I'd like to discuss with you.
> 
> First, I've noticed that there's a difference in the internal
> implementation of gettext and GNU gettext regarding the returned
> encoding on non-unicode strings. Notice the difference in the
> result of this code:
> 
>    import gettext
>    import locale
>    locale.setlocale(locale.LC_ALL, "")
>    locale.textdomain("apt-cdrom-registry")
>    gettext.textdomain("apt-cdrom-registry")
>    print locale.gettext("Choose the available CDROMs from the list below")
>    print gettext.gettext("Choose the available CDROMs from the list below")
> 
> This has shown the following:
> 
>    Escolha os CDROMs disponíves na lista abaixo
>    Escolha os CDROMs disponíves na lista abaixo
> 
> The reason for this difference is clear: GNU gettext defaults to the
> current locale when returning encoded strings, while gettext.py
> returns strings in the encoding used in the .mo file. The fix is
> simply changing the following code
> 
>    # Encode the Unicode tmsg back to an 8-bit string, if possible
>    if self._charset:
> 	return tmsg.encode(self._charset)
> 
> to use the system encoding (sys.getdefaultencoding()) instead of
> self._charset.

I'd be worried most about backwards compatibility, since the module has
worked this way since its early days.  Also, wouldn't this be an
opportunity for getting lots of UnicodeErrors?  E.g. my system encoding
is 'ascii' so gettext() would fail for catalogs containing non-ascii
characters.  I shouldn't have to change my system encoding just to avoid
errors, but with your suggestion, wouldn't that make many catalogs
basically unusable for me?

> Regarding a similar issue, I've also noticed that we're currently
> missing bind_textdomain_codeset() support. This function changes
> the codeset used to return the translated strings.
> 
> So, I'd like to implement the following changes:
> 
> - Change the default codeset used by gettext.py in functions
>   returning an encoded string to match the system encoding.
> - Introduce bind_textdomain_codeset() in locale.
> - Introduce bind_textdomain_codeset() in gettext.py implementing
>   an equivalent functionality.

Would adding bind_textdomain_codeset() would provide a way for the
application to change the default encoding?

If so, I'd be in favor of adding bind_textdomain_codeset() but not
changing the default encoding for returned strings.  Then update the
documentation to describe current behavior and how to change it via that
function call.

-Barry





More information about the Python-Dev mailing list