Marking translatable strings

Thu Sep 16 13:14:05 EDT 1999

>>>>> <pinard at iro.umontreal.ca> writes:

    > I had a real strange idea :-).  I first quickly dismiss it, but
    > it is so simple that I prefer to ponder it again, and share and
    > debate it a little first, maybe.

Not so strange!  There's a big push to internationalize Mailman
<www.list.org> and I've promised it as an important goal for the next
release.  Unfortunately I haven't had much time to work on Mailman
lately so my few tentative steps haven't even made it into the CVS
tree yet.  There's been discussion on the
mailman-developers at python.org list and you can peruse the archives for 
much of the current thinking.  Sorry I don't have time to give exact
URLs but the archives are available at:

    http://www.python.org/pipermail/mailman-developers

Just a few quick comments, and then a pointer to some code...

    > Python has no preprocessing, no special string syntax for
    > markability, and moreover, it has doc strings!  So, at first
    > glance, it looks difficult.

The first attempts in the Mailman community for marking strings
followed the reasoning you did.  I don't like it because I want to be
able to use any string spelling as translatable strings.  Anything
else would be way to error prone in a project that has so many hands
on it.

I considered lobbying Guido for another string prefix, say i"..." to
mark translatable strings, but decided against that for a number of
reasons.  The approach I think I will take is one that you also
suggested, using _("text") as the marker for translatable strings.
This has the advantage that `_' is a perfectly good name for a
function, which could be the identity function if, say the end user
doesn't have the I18N stuff installed on their system.  But it could
also easily be the name of the function that does the string lookup,
returning the translated string.  Yes _ has the disadvantage of
already having meaning in interactive mode, but I wasn't too concerned 
about that (maybe I'm being naive tho', as I mentioned I haven't
really gotten /that/ far yet).  If it's a problem __("text") would be
fine (double underscore).

What I /have/ done is start work on a pygettext.py script that is in
the spirit of GNU gettext.  Instead it knows a lot about Python
syntax, including the 9 billion ways to spell `string'.  pygettext
uses Python's standard tokenize module to scan through a file of
Python code, search for _(text) matches, where text is any of 'text',
"text", '''text''', etc.  The output is pretty close to .po style,
with strings normalized to .po format.  The output formatter may still 
have some bugs in it.

Since I don't envision much time to hack on this in the near future,
I'll just go ahead and post the code on my pyware page:

    http://www.python.org/~bwarsaw/software/pyware.html

Give me about 10 minutes and then feel free to grab it.  I'd love to
get any comments or patches that you might come up with.

-Barry