[Python-ideas] Combine f-strings with i18n

Stephen J. Turnbull turnbull.stephen.fw at u.tsukuba.ac.jp
Mon Sep 17 13:42:21 EDT 2018

Hans Polak writes:
 > On 17/09/18 09:53, Niki Spahiev wrote:
 > >
 > > Is it possible to use f-strings when making multilingual software?
 > > When i write non-hobby software translation is hard requirement.
 > At this moment, it seems that this is not possible.

No, it's not possible.

 > If a user has the navigator configured for English, I have to
 > return English (if I am doing i18n).

This is understood.  Nobody is telling you what you want is an
unreasonable desire.  I'm telling you that I'm pretty sure your
proposed syntax isn't going to happen, because it requires deep
changes in the way Python evaluates expressions as far as I can see.

 > That's why I would like to see a parameter that can be passed to
 > the f-string.

This doesn't make sense to me.  Such configurations are long-lasting.
In this context, the POSIX model (where the target language, or
priority list of languages, is configured in the environment) is
reasonable.  There's no good reason for passing the language every
time a string is formatted throughout an interaction with such a user.

What we want is a way to tell the f-string to translate itself, and
optionally specify a language.

 > I don't think this should be too problematic, really.

Your proposal to use method syntax is *definitely* problematic.  The
f-string is an expression, and must be evaluated to a str first
according to the language definition.

 > The compiler can then rewrite these to normal unicode strings. For 
 > instance: f'Hi {user}'.language('es') would become T(_('Hi {user}'), 
 > 'es', user=user)

It could, but it's not going to.  Implementing that with a reasonable
amount of backward compatibility requires two tokens of lookahead and
a new keyword as far as I can see.  The problem is that unless the
.language method is invoked on the f-string in the same expression,
the f-string needs to be converted to a string immediately, as it is
in Python 3.6 and 3.7.  To decide whether to do this in the case where
there is a method invoked, the parser needs to read the f-string
(lookahead = 0), the "."  (lookahead = 1), and the token "language"
(lookahead = 2), and then for the parser to know that this is the
special construct, it needs to know the special token.  "Keyword" in
this context simply means "a token that the parser knows about", but
in general in Python we want keywords to be reserved to the language,
which is considered a very high cost.

What could work is an extension to the formatting language.  I suggest
abusing the *conversion flag*.  (It's an abuse because I'm going to
apply it to the whole f-string, while the current Language Reference
says it's applied to the value being formatted.[1])  This flag would only
be allowed as the first item in the string.  The idea is that
`f"{lang!g}Hello, {user}!"` would be interpreted as

    _ = get_gettext(lang)
    _("Hello, {user}!").format(user=user)

and `f"{!g}Hello, {user}!"` as `_("Hello, {user}!").format(user=user)`,
reusing the the most recent value of `_`.  The "g" in "!g" stands for
"gettext", of course.  GNU xgettext can be taught to recognize things
like `f"{...!g}` as translatable string markers; I'm sure pygettext
can too.

I'm assuming the implementation of get_gettext from my earlier post,
reproduced at the end for reader convenience.

One warning about this syntax: I think the gettext module is a pretty
popular way to implement message localization.  However, I'm not sure
it's the only way, and I would guess that folks who use something else
would want to be able to use f-strings with that package, too.  So
there may need to be a way to configure the translation engine.

>      # Use duck-typing of gettext.translation objects
>      class NullTranslation:
>          def __init__(self):
>              self.gettext = lambda s: s
>      def get_gettext(language, translation={'C': NullTranslation()}):
>          if language not in translation:
>              translation[language] = \
>                  gettext.translation('myapplication', languages=[language])
>          return translation[language].gettext
> and
>      # This could be one line, but I guess in many cases you're likely
>      # to use use the gettext function repeatedly.  Also, use of the
>      # _() idiom marks translatable string for translators.
>      _ = get_gettext(language)
>      _(translatable_string).format(key=value...)

[1]  I'm not sure whether "the g conversion is applied to the 'lang'
variable, and the effect is to set the global 'translate' function" is
pure sophistry or not, but I don't find it convincing.  YMMV

More information about the Python-ideas mailing list