[I18n-sig] Re: pygettext.py extraction of docstrings
Barry A. Warsaw
barry@wooz.org
Wed, 15 Aug 2001 00:10:24 -0400
>>>>> "BH" == Bruno Haible <haible@ilog.fr> writes:
BH> Well, Common Lisp has had docstrings long before Emacs-Lisp
BH> and Python. Their purpose is to have documentation available
BH> for the programmer, in a running session, regardless where
BH> each class or function came from.
BH> Now, why do you want to translate them?
In my Python experience, there is one common situation where you want
to translate docstrings. Note that in Python, unlike I believe as in
*Lisp, docstrings can be attached to objects other than functions.
It's common to have both module and class docstrings. Again, IME
class docstrings serve similar audiences to function/method
docstrings, i.e. the programmer. It is a common idiom in Python to
use module docstrings as usage text for command line scripts, and
those are definitely intended for the end user, and must be
translated.
I've had occasion to use class docstrings as strings for the user too,
although I won't claim that's wonderful Python style.
But as Martin points out, a case can be made for translating even
class, function, and method docstrings. Think of the situation where
manuals are automatically extracted from source code, a la Javadoc. I
believe you'd want those strings to be extracted into the catalog.
BH> As a consequence for gettext, I could live with an xgettext
BH> option --docstrings which extracts *only* the docstrings of a
BH> set of source files.
I made the semantics for pygettext.py's --docstrings/-D option to
extract /also/ the docstrings because the older version of msgmerge I
am using can't merge a docstring-only catalog with a normal-string
catalog in a reasonable way (I tried). And as stated above, module
docstrings can serve exactly the same audience as other translatable
strings, i.e. the end user, so they should be in the same catalog.
But I was also forced to add a very inelegant -X/--exclude-file switch
which suppressed docstring extract for the listed files. While that
served my purpose, it's a gross hack, and not just because it doesn't
provide the necessary granularity.
More productive I think would be for us to agree on a convention for
extracting docstrings that doesn't require both -D and -X. Here are
two strawmen:
1) pygettext.py and xgettext never extract unmarked docstrings unless
the -D/--docstrings option is given. If -D is given then all
unmarked docstrings are extracted along with all other normally
marked text, unless the unmarked docstring is immediately preceded
by a comment with the word "notranslate" as the first word in the
comment. All other words in the comment are ignored.
2) pygettext.py and xgettext never extract unmarked docstrings unless
they are immediately preceded by a comment with the word
"translate" as the first word in the comment. All other words in
the comment are ignored.
Feel free to knock these down. :)
>> Perhaps Bruno can add some information on pygettext.py in the
>> GNU gettext manual?
BH> The GNU gettext tools are currently being modified to handle
BH> various programming languages. A new flag 'python-format' is
BH> being introduced, with appropriate format string checking in
BH> 'msgfmt'.
I'm not sure exactly what this means. Can you give a bit more detail?
BH> xgettext will also have a Python backend, making pygettext
BH> obsolete (except for docstring extraction, for the time
BH> being).
That'd be great. It'll be even cooler if we can agree on a convention
for docstring extraction!
BTW, here's the current set of switches for pygettext.py. Do you see
any glaring incompatibilities with you latest xgettext?
Cheers,
-Barry
-------------------- snip snip --------------------
Usage: pygettext [options] inputfile ...
Options:
-a
--extract-all
Extract all strings.
-d name
--default-domain=name
Rename the default output file from messages.pot to name.pot.
-E
--escape
Replace non-ASCII characters with octal escape sequences.
-D
--docstrings
Extract module, class, method, and function docstrings. These do not
need to be wrapped in _() markers, and in fact cannot be for Python to
consider them docstrings. (See also the -X option).
-h
--help
Print this help message and exit.
-k word
--keyword=word
Keywords to look for in addition to the default set, which are:
%(DEFAULTKEYWORDS)s
You can have multiple -k flags on the command line.
-K
--no-default-keywords
Disable the default set of keywords (see above). Any keywords
explicitly added with the -k/--keyword option are still recognized.
--no-location
Do not write filename/lineno location comments.
-n
--add-location
Write filename/lineno location comments indicating where each
extracted string is found in the source. These lines appear before
each msgid. The style of comments is controlled by the -S/--style
option. This is the default.
-o filename
--output=filename
Rename the default output file from messages.pot to filename. If
filename is `-' then the output is sent to standard out.
-p dir
--output-dir=dir
Output files will be placed in directory dir.
-S stylename
--style stylename
Specify which style to use for location comments. Two styles are
supported:
Solaris # File: filename, line: line-number
GNU #: filename:line
The style name is case insensitive. GNU style is the default.
-v
--verbose
Print the names of the files being processed.
-V
--version
Print the version of pygettext and exit.
-w columns
--width=columns
Set width of output to columns.
-x filename
--exclude-file=filename
Specify a file that contains a list of strings that are not be
extracted from the input files. Each string to be excluded must
appear on a line by itself in the file.
-X filename
--no-docstrings=filename
Specify a file that contains a list of files (one per line) that
should not have their docstrings extracted. This is only useful in
conjunction with the -D option above.
If `inputfile' is -, standard input is read.