[Python-Dev] Re: PEP 292, Simpler String Substitutions

23 Jun 2002 15:05:45 -0400

[Barry A. Warsaw]
> "NH" == Neil Hodgson <nhodgson@bigpond.net.au>

> Another i18n approach altogether uses explicit message ids instead of
> using the source string as the implicit message id, but that has a
> whole 'nuther set of issues.

The `catgets' approach, by opposition to the `gettext' approach.  I've seen
some people having religious feelings in either direction.

Roughly said, `catgets' is faster, as you directly index the translation
string without having to hash the original string first.  It is also easier
to translate single words or strings offering little translation context,
as English ambiguities are resolved by using different message ids for
the same text fragment.

On the other hand, `gettext' can be made nearly as fast as `catgets', only
_if_ we use efficient hashing combined with proper caching.  But the real
advantage of `gettext' is that internationalised sources are more legible
and easier to maintain, since the original string is shown in clear exactly
where it is meant to be used.

A problem with both is that implementations bundled in various systems
are often weak of bugged, provided they exist of course.  Portability is
notoriously difficult.  Linux and GNU `gettext' rate rather nicely.
But nothing is perfect.

> [...] you tend to get paranoid about changing /any/ source string, say
> to remove a comma, adjust whitespace, or fix a preposition.  Any change
> means a dozen language teams have a new message they must translate
> (unless you can mechanically fix them for them).

This is why the responsibilities between maintainers and programmers ought
to be well split.  If the maintainer feels responsible for the work that
is induced on the translation teams by string changes, comfort is lost.
The maintainer should do its work in all freedom, and the problem of
later reflecting tiny editorial changes into PO `msgstr' fully pertains to
translators, with the possible help of automatic tools.  Translators should
be prepared to such changes.  If the split of responsibilities is not
fully understood and accepted, internationalisation becomes much heavier,
in practice, than it has to be.

>     >> The feature would be useless to me if I had to pass some explicit
>     >> dictionary into the _() method.  It makes writing i18n code
>     >> extremely tedious.

>     NH>    I think you are overstating the problem here.

> Trust me, I'm not.  [...] being forced to pass in the explicit bindings
> is a big burden in terms of maintainability and readability.

>     NH> Not making bindings explicit may mean that translators use
>     NH> other variables available at the translation point leading to
>     NH> unexpected failures when internal details are changed.

> I18n'ing a program means you have to worry about a lot more things.  [...]

Internationalisation should not add a significant burden on the programmer.
I mean, if there is something cumbersome in the internationalisation of
a string, then there is something cumbersome in that string outside any
internationalisation context.

If internationalisation really adds a significant burden, this is a
signal that internationalisation has not been implemented well enough in
the underlying language, or else, that it is not getting used correctly.
I really think that internationalising of strings should be designed so
it is a light activity and negligible burden for the maintainer.  (And of
course, translators should also get help in form of proper files and tools.)

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard