[I18n-sig] Re: pygettext.py extraction of docstrings

Bruno Haible haible@ilog.fr
Wed, 15 Aug 2001 18:39:58 +0200 (CEST)


Barry A. Warsaw writes:
>     BH> The GNU gettext tools are currently being modified to handle
>     BH> various programming languages. A new flag 'python-format' is
>     BH> being introduced, with appropriate format string checking in
>     BH> 'msgfmt'.
> 
> I'm not sure exactly what this means.  Can you give a bit more detail?

When a Python program contains a string like "%(name)s, %(firstname)s"
xgettext will mark it as "#, python-format", in order to tell the
translator that the string is a format string. If the translator then
gives an incorrect translation, say "%(fistname)s %(name)s" or
"%(firstname) %(name)s", then "msgfmt --check" will give an
appropriate error message.

> BTW, here's the current set of switches for pygettext.py.  Do you see
> any glaring incompatibilities with you latest xgettext?

pygettext doesn't extract comments of the form
"translator: the c, is a c-cedilla" (xgettext option --add-comments) or
"xgettext: no-python-format" (lets the programmer override the format
string guessing).

Other than that: xgettext doesn't have --docstrings and
--no-docstrings yet :-)

The -K option doesn't exist in xgettext, you have to use --keyword
instead.

Also, xgettext doesn't have -S/--style. The Solaris style is available
only with --strict.

> But as Martin points out, a case can be made for translating even
> class, function, and method docstrings.  Think of the situation where
> manuals are automatically extracted from source code, a la Javadoc.  I
> believe you'd want those strings to be extracted into the catalog.

I believe those strings belong into a different catalog. If you then
want them in the same catalog, you can use "msgcat" to combine both
catalogs.

The reasons for a different catalog: 1) Normal strings and docstrings
may need to be handled by different translators. 2) They may need
different extraction options. Your addition of --no-docstrings
indicates that docstrings may come from a different set of
files. Instead of forcing all options into a single xgettext command
line, what I propose is that you call xgettext twice, once for the
normal strings and once for the docstrings, with independent command
line options, and on independent (but potentially overlapping) sets of
files. This gives you the maximum flexibility.

> Here are two strawmen:
> 
> 1) pygettext.py and xgettext never extract unmarked docstrings unless
>    the -D/--docstrings option is given.  If -D is given then all
>    unmarked docstrings are extracted along with all other normally
>    marked text, unless the unmarked docstring is immediately preceded
>    by a comment with the word "notranslate" as the first word in the
>    comment.  All other words in the comment are ignored.
> 
> 2) pygettext.py and xgettext never extract unmarked docstrings unless
>    they are immediately preceded by a comment with the word
>    "translate" as the first word in the comment.  All other words in
>    the comment are ignored.

Here is my strawman:

  pygettext.py and xgettext never extract unmarked docstrings by
  default. If option -D/--docstrings is given, it extracts docstrings
  only. A separate option like --keywords can be used to select or
  inhibit the docstrings.

> I think that Martin's and my applications show that we probably need
> to cover these two situations:
>
> 1) No docstrings are extracted unless they are preceded by a magic
>    "extract" comment.
>
> 2) All docstrings are extracted unless they are preceded by a magic
>    "noextract" comment.

I agree.

> Note that in Python, unlike I believe as in *Lisp, docstrings can be
> attached to objects other than functions.  It's common to have both
> module and class docstrings.

Lisp has grown up since then. Nowadays you can attach docstrings not
only to functions and macros, but also to classes, methods and packages.
The macros defclass, defmethod and defpackage support this.

Bruno