[I18n-sig] Re: pygettext.py extraction of docstrings

Barry A. Warsaw barry@wooz.org
Wed, 15 Aug 2001 00:10:24 -0400

>>>>> "BH" == Bruno Haible <haible@ilog.fr> writes:

    BH> Well, Common Lisp has had docstrings long before Emacs-Lisp
    BH> and Python. Their purpose is to have documentation available
    BH> for the programmer, in a running session, regardless where
    BH> each class or function came from.

    BH> Now, why do you want to translate them?

In my Python experience, there is one common situation where you want
to translate docstrings.  Note that in Python, unlike I believe as in
*Lisp, docstrings can be attached to objects other than functions.
It's common to have both module and class docstrings.  Again, IME
class docstrings serve similar audiences to function/method
docstrings, i.e. the programmer.  It is a common idiom in Python to
use module docstrings as usage text for command line scripts, and
those are definitely intended for the end user, and must be

I've had occasion to use class docstrings as strings for the user too,
although I won't claim that's wonderful Python style.

But as Martin points out, a case can be made for translating even
class, function, and method docstrings.  Think of the situation where
manuals are automatically extracted from source code, a la Javadoc.  I
believe you'd want those strings to be extracted into the catalog.

    BH> As a consequence for gettext, I could live with an xgettext
    BH> option --docstrings which extracts *only* the docstrings of a
    BH> set of source files.

I made the semantics for pygettext.py's --docstrings/-D option to
extract /also/ the docstrings because the older version of msgmerge I
am using can't merge a docstring-only catalog with a normal-string
catalog in a reasonable way (I tried).  And as stated above, module
docstrings can serve exactly the same audience as other translatable
strings, i.e. the end user, so they should be in the same catalog.

But I was also forced to add a very inelegant -X/--exclude-file switch
which suppressed docstring extract for the listed files.  While that
served my purpose, it's a gross hack, and not just because it doesn't
provide the necessary granularity.

More productive I think would be for us to agree on a convention for
extracting docstrings that doesn't require both -D and -X.  Here are
two strawmen:

1) pygettext.py and xgettext never extract unmarked docstrings unless
   the -D/--docstrings option is given.  If -D is given then all
   unmarked docstrings are extracted along with all other normally
   marked text, unless the unmarked docstring is immediately preceded
   by a comment with the word "notranslate" as the first word in the
   comment.  All other words in the comment are ignored.

2) pygettext.py and xgettext never extract unmarked docstrings unless
   they are immediately preceded by a comment with the word
   "translate" as the first word in the comment.  All other words in
   the comment are ignored.

Feel free to knock these down. :)

    >> Perhaps Bruno can add some information on pygettext.py in the
    >> GNU gettext manual?

    BH> The GNU gettext tools are currently being modified to handle
    BH> various programming languages. A new flag 'python-format' is
    BH> being introduced, with appropriate format string checking in
    BH> 'msgfmt'.

I'm not sure exactly what this means.  Can you give a bit more detail?
    BH> xgettext will also have a Python backend, making pygettext
    BH> obsolete (except for docstring extraction, for the time
    BH> being).

That'd be great.  It'll be even cooler if we can agree on a convention
for docstring extraction!

BTW, here's the current set of switches for pygettext.py.  Do you see
any glaring incompatibilities with you latest xgettext?


-------------------- snip snip --------------------
Usage: pygettext [options] inputfile ...


        Extract all strings.

    -d name
        Rename the default output file from messages.pot to name.pot.

        Replace non-ASCII characters with octal escape sequences.

        Extract module, class, method, and function docstrings.  These do not
        need to be wrapped in _() markers, and in fact cannot be for Python to
        consider them docstrings. (See also the -X option).

        Print this help message and exit.

    -k word
        Keywords to look for in addition to the default set, which are:

        You can have multiple -k flags on the command line.

        Disable the default set of keywords (see above).  Any keywords
        explicitly added with the -k/--keyword option are still recognized.

        Do not write filename/lineno location comments.

        Write filename/lineno location comments indicating where each
        extracted string is found in the source.  These lines appear before
        each msgid.  The style of comments is controlled by the -S/--style
        option.  This is the default.

    -o filename
        Rename the default output file from messages.pot to filename.  If
        filename is `-' then the output is sent to standard out.

    -p dir
        Output files will be placed in directory dir.

    -S stylename
    --style stylename
        Specify which style to use for location comments.  Two styles are

        Solaris  # File: filename, line: line-number
        GNU      #: filename:line

        The style name is case insensitive.  GNU style is the default.

        Print the names of the files being processed.

        Print the version of pygettext and exit.

    -w columns
        Set width of output to columns.

    -x filename
        Specify a file that contains a list of strings that are not be
        extracted from the input files.  Each string to be excluded must
        appear on a line by itself in the file.

    -X filename
        Specify a file that contains a list of files (one per line) that
        should not have their docstrings extracted.  This is only useful in
        conjunction with the -D option above.

If `inputfile' is -, standard input is read.