[I18n-sig] Re: pygettext dilemma

Barry A. Warsaw barry@zope.com
Tue, 14 Aug 2001 23:53:11 -0400


>>>>> "FP" =3D=3D Fran=E7ois Pinard <pinard@iro.umontreal.ca> writes:

    >> Indeed!  BTW, I18N Mailman is coming along very nicely now.  I
    >> hope the 2.1 release will happen within the next few months.

    FP> I have a few friends who are impatiently waiting for this
    FP> release! :-)

Soon, soon!

    >> What I do in this situation is to temporarily bind _() to a
    >> no-op function so that the string is marked for extraction, but
    >> not translated in place.  E.g.

    >> import gettext

    >> def _(s): return s

    >> foo =3D _('extract this string but do not translate it yet')

    >> _ =3D gettext.gettext

    FP> No hurt intended of course, you should know be better :-).
    FP> Let me friendly stress that constructs like above are ugly.
    FP> We should set up examples, that people could follow, in which
    FP> we rely on a single, common, widespread, unvarying
    FP> interpretation of _(TEXT), without having to look around each
    FP> time to see what it means, or set and reset its meaning.  The
    FP> above is a kludge that does not fit well with what I think is
    FP> good Python style.

No hurt taken, but I'll respectfully disagree. :) I think it's fine
Python style for deferring translation, and not confusing at all
because it is almost always localized around the site of the deferral.
But contrary to Tim's license plate, there /is/ more than one way to
do it. :)

pygettext.py supports a -k/--keyword flag, similar to xgettext, which
expands the list of function names marking translatable strings.
IIRC, gettext suggests binding N_() to gettext_noop() and then
extracting any string wrapped in N_().  So, if you prefer, you can
rewrite my example above to be:

    from gettext import gettext as _

    def N_(s): return s

    foo =3D N_('extract this string but do not translate it yet')

and then run pygettext.py with --keyword=3DN_

Hmm, maybe we should add "N_" as one of the default keywords?

That points out a general philosophy I have that pygettext.py should
mimic xgettext as much as makes sense for the difference between C and
Python.  In this case _() works great for most at-site translation
markings, but for the very few that must be deferred, either the
rebind hack or the N_() marking should suffice.

    >> This works perfectly because Python doesn't suffer from the
    >> same deficiencies as C (i.e. the C pre-processor :).

    FP> I quite understand that "it works", but yet, it much suffers,
    FP> both on the side of legibility and simplicity.

Again, I must respectfully disagree!

    >> | ''"""TEXT""" 8-quoted marked

    >> This has been brought up before, and I know that some people
    >> really like this approach.  I don't though, because 1) it is
    >> too magical; 2) the rules are arbitrary and hard to remember;
    >> 3) explicit is better than implicit.

    FP> As long `pygettext.py' (or `xgettext' or `xpot') is involved,
    FP> there is some unavoidable magic somewhere.  Even _(TEXT) does
    FP> not give much clue to a newcomer about the mandatory
    FP> extraction process.

This is true.  But it's still clearer that there is /some/ reason for
marking the string with _() because you can quickly trace your way
back to gettext.gettext() and then it's obvious <wink> the connection
to the runtime translation process if not the the extraction process.

Which leads me to another question: are you saying that ''"""Text"""
should be used for both the runtime translating and the extraction
marking?  If so, I don't see how that could work.  Even if you could
make it work, I still much prefer have a Real Python Function do the
runtime translation.  An example of why is what I really do in
Mailman...

Say I have the following string that needs to be translated:

    _('No such list %s found on host %s') % (listname, hostname)

Now we all know that this won't do as a source string because there
may be some languages may change the order of the variables, so we
really need to write the string like so:

    _('No such list %(listname)s found on host %(hostname)s') % {
        'listname': listname,
=09'hostname': hostname
=09}

I've found this style to be quite pervasive, but also extremely (and
unnecessarily) repetitive.  Notice that I've typed "listname" and
"hostname" a total of six times.  Wouldn't it be wonderful if I only
needed to type them once:

    _('No such list %(listname)s found on host %(hostname)s')

?  Yes, it's great because -- to me -- I'm trading a modicum of
specialness for a huge raft of simplicity and legibility.  It really
does make the code easier to read, I claim (although it would be
interesting to know what others who have hacked on the Mailman 2.1
code think).  How do I make this work?

The trick is that the function _() isn't gettext.gettext() but a
wrapper around that library function that's unique to Mailman.  In
fact, you won't see many "import gettext"'s in the Mailman code, but
you will see lots of "from Mailman.i18n import _".

My _() actually uses sys._getframe() -- where available -- to get the
locals and globals one stack frame up from the _() frame, and then
automatically interpolates that dictionary into the translatable
string.  Is that magic?  Yes, a bit, but it's magic that is easily
revealed by finding the import, and viewing the Mailman.i18n module.
And once learned, I claim that it's immediately ingrained and needn't
be learned again.

But you might disagree, and use the more verbose approach for your
app.  No problem there!  Having a function call that can be
specialized in the Pythonic way serves both purposes well.

    FP> About the idiom of prefixing a string with two quotes of the
    FP> other kind, I find it quite easy to explain and remember.

I had to really think about the rule, as opposed to the example, in
your original message.  I think your rule goes: prepend the string you
want to extract with an empty string quoted with the alternative
quoting characters from the string you want to extract.  Or something
like that. :)

But there is another problem: for some fonts in some IDE's it can be
challenging to discern ' from " or even ` and having something like
""'''...''' makes it even more difficult to visually pick out.

    >> Seeing something like an unadorned ""'Traditional Chinese'
    >> really gives no clue as to the purpose of this strange markup,

    FP> In my opinion, this is equally opaque to use _(TEXT) after
    FP> having temporarily redefined _() as the identify function.  It
    FP> only acquire meaning to a user after s/he learns about the
    FP> extraction process, you just cannot make it evident.  The
    FP> explanation is unavoidable, anyway.  Redefining _() is a
    FP> formidable stunt.  Concatenating an empty string is much
    FP> simpler and cleaner.

Let me see if I can sum up my objection: you have to use a function
call anyway to do the actual runtime translation.  Since at-site
translations will be the overwhelming majority of examples, so will
_() markings.  For all those cases, you won't need
empty-string-contatenation anyway.  For the handful of cases where you
need to defer translation, I prefer using a technique as similar to
the common way as possible, instead of introducing an entirely
different convention.

But I wouldn't cry foul if you encouraged N_() markings for deferred
translations.

    >> Or, you can sometimes do something ugly like use explicit

    >> __doc__ =3D _('Here is a module docstring')

    >> Not pretty, but also not common I think, so it doesn't concern
    >> me much.

    FP> Let's avoid being ugly, as far as we can.  Keep in mind that
    FP> you are opening a way, here, and setting up examples and
    FP> methods that will stick, and have incidence.  (One never
    FP> knows.  When I started to use `_' instead of explicit
    FP> `gettext' calls, most people were reluctant, and told me that
    FP> it was to break with so many C compilers that I should give up
    FP> now; Richard Stallman just refused to see GNU standards
    FP> suggesting it; but I used it nevertheless and for many
    FP> packages, to the point it stuck somewhat; nowadays, many
    FP> languages spontaneously use conventions similar to it.)

And I think it's a wonderful convention!  I'm glad you came up with
it, and I happily adopted it for Python.  It's beautiful. :)

I won't disagree that the __doc__ hack is ugly.  The more I think
about it, I think a magic comment in front of the docstring is the way
to go.  I'm not yet sure whether something like

    # noextract
    '''This is a docstring that need not be translated.'''

or

    # extract
    '''This is a docstring that should be translated.'''

is better, or whether there's some other better comment keyword to
use.  This would be worth experimenting with a bit. =20

    FP> My point is that you should look forward and a little beyond
    FP> the immediate needs.  Even if does not concern you much, let's
    FP> try to do well.

Agreed!

    >> I appreciate the suggestions Francois!  I think what we've got
    >> gives us the best approach for Python programs.

    FP> I would not want to crusade inordinately over this, and I'm
    FP> not really trying to punch _my_ own suggestions through.
    FP> Really not!  On the other hand, I would like to convince you
    FP> that temporarily overriding _(), or assigning the __doc__
    FP> attribute directly, just _cannot_ be "the best approach".

Let's not conflate what we're talking about.  One situation is
deferred translation, the other is docstring extraction marking.  For
the former, I'm completely happy with rebinding _(), although I
wouldn't squawk if you pushed for N_() <wink>.  For the latter, I
agree that explicit __doc__ binding is gross and we should avoid
it.  Here, I think the special comment is the way to go, but I'm not
sure about the details.  Please let's keep these two issues separate!
   =20
    FP> We should do better than that.  My suggestion does better
    FP> already, but I see we do not agree on this, a bit sadly...  I
    FP> surely do not mind if someone comes with something even better
    FP> that what we both suggest, and do hope it happens!  But we
    FP> should at least come with something as good.

A good, lively debate.  Thanks!

Cheers,
-Barry