[spambayes-dev] State of I18N

Hernán Martínez Foffani hernan at orgmf.com.ar
Wed Oct 20 18:56:45 CEST 2004


I made some tests and managed to build a partially translated
version of the plugin.  Follows what I found so far.

Please consider every paragraph I wrote ending with
"What do you think?", "Is this the right approach?" or
"Am I screwing up something?" kind of questions.


GET THE USER LANGUAGE

I'm using locale.getdefaultlocale() to get the user lang
preference.  I'll add the chance to let the user specify
another language in his config file.

A different approach would be to get Outlook's language-id
version (how?) instead of the user's locale and convert it
later to Python's RFC1766 format code.

Languages are "stackable" so they can naturally fallback
one over the other if their translations are missing or
incomplete.


PURE PYTHON LITERAL STRINGS

By using gettext, I found that adding a small class and a couple
of lines at the begining of manager.py:BayesManager.__init__()
is enough to have the infrastructure for the python code part.

I'll leave out of translation assert's and LogDebug comment
strings.  The rest would be tedious but easy to do.  I'll
follow Tony's advise and start the addin translation after
sb_server but I want to define the translation tools first.

Some things may require some rework though:
  - Multiline messages that use one print sentence per line.
    (Replace those by "xxx\r\n" "yyy\r\n" "zzz\r\n"?)
  - Those that mix literal strings with parameters concatenating
    them instead of using placeholders.
  - The special case of mix literals with HTML.
    (see addin.py:ShowClues)
  - There are literals words around that might be used for
    program flow (eg: "ham","spam",etc.)  Not sure about it.

Anyway, no big surprises.

gettext seems good for several reasons:
  - python officially blessed it,
  - python dist has (almost) all the tools a translator needs,
  - minimizes changes on source code,
  - includes fallbacks to language/country and
  - I like it.


DIRECTORIES

The gettext library uses {SomeDir}/langcode/LC_MESSAGES/message_file
layout.  In the sources I'm working on I made {SomeDir} as
"Languages" parallel to Outlook2000, pspam and scripts directories.

Doing so I can place all the translated messages files in one
directory and manager.py&co can reach them at ../Languages.

Is it too hard to test this layout for a frozen distribution?


DIALOGS

These were tricky.
Binary dist SB imports them directly while the source addin
compiles them first.  I don't think its necessary to
reproduce the same thing for foreign dialog.rc's.  A translator
can run rc2py at the command prompt to create the corresponding
.py file.

To imitate the fallback behaviour for dialogs I had to play
a bit with sys.path.  Setting a language, besides initializing
gettext, appends __file__/../Languages/langcode/DIALOGS and
__file__/../Languages/langcode[0:2]/DIALOGS to sys.path.
(Actually it's not __file__ but this_filename and it's not
[0:2] but the first "_" split.)

Say the locale "es_ES", then SB will try to import "i18n_dialog.py"
from "Languages/es_ES/DIALOGS" then from "Languages/es/DIALOGS".
If it fails it fallbacks to the current code.


GETTEXTED DIALOGS

I mentioned before that I would like to make rcparser.py
gettext aware so a translator could work out a solution without
having to edit the resource dialogs.

The problem I found is that the output of the parser is a dict
that includes string literals subject to be translated (like
caption or labels) and literals that do not (like font names.)
Once in the dict there's no way to differentiate between them.
As I expect that changes in the dict would imply changes all
over SB, I thought of a partial solution (call it a hack)
that may work.

AFAICT, SB does not use the output of rcparse directly but
through a rc2py previous process.  So how about a subclass
of str that override __repr__ ?  Then, rcparse only needs to
create instances of this class for labels, captions, etc.,
not for font names (or every literal can be of this new type
and an attribute flag can drive __repr__ accordingly.)

Later, rc2py and imports of the dialogs .py would do the
right thing.  Something like this:

class gt_str(str):
    def __repr__(self):
        return "_(" + super(gt_str, self).__repr__() + ")"

>>> a="a"
>>> b=gt_str("a")
>>> a==b
True
>>> a
'a'
>>> b
_('a')
>>>

I said it's a partial solution because the dict that
rcparse generates does not equal (items of different types)
the one obtained by importing the dialog.  Does it matter?
(It's also "partial" because I haven't tested it yet. heh..)


TRANSLATOR TOOLS

With this approach a translator would need:
	- Outlook with SB (binary dist is enough)
	- python (for the gettext tools)
	- the rcparse/rc2py tools
also highly recommended:
	- a free resource editor
and optionally:
	- SB source distro (i think it's no needed, really)

To translate:
	- the "all_the_messages_in_SB" file (I can provide it
	  for each new version of SB)
	- the dialogs.rc file for each new version of SB.


Still, I have to search for a tool or procedure to let a
translator knows which messages are not translated yet.
If you know of a smart .po (gettext messages files)
comparition tools, tell me please.


That's all for the time being.
Soon, I'm going to polish the changes I made and load the
corresponding patches to sf.net for your review.


Regards,
-Hernán.

PS: Please, bare my english.



More information about the spambayes-dev mailing list