[Doc-SIG] Translation of Python documentation

Mon, 9 Jul 2001 17:32:09 +0200

At the European Python Meeting, we had a few sessions on translating
Python documentation. These were initiated by Benoit Lacherez, who
currently manages the French Python translations (frpython.sf.net).

The French translation group currently uses a script to markup the
original documentation, which copies the English text into commented
regions. The translator inserts the French translations in-between
these regions.

We have discussed versioning of the documentation to some extend, and
found two problems:

1. it is still unclear if and when the documentation will be converted
   to XML. Having XML might simplify the translation process to some
   degree, but it will also mean that the existing translations need
   to be converted, as well.

2. version tracking is quite a challenge. So far, the French
   translators had problems when documentation moved from one file to
   another after 1.5.2. However, we anticipate further problems with
   version changes, like:
   - the order of paragraphs or sections may change
   - changes might merely affect formatting (e.g. line breaking), but
     a plain diff will display the entire paragraph as changed

3. it might be desirable to offer "incomplete" translations, which
   only offers translation when they are available, and English
   documentation for the rest.

To solve these issues, we propose that
a) the conversion to XML is done rather sooner than later,
b) in the original documents, unique identifications of sections
   and *desc elements are introduced. These identifications can
   then be used in the translations to specify correlate the
   translations with the original text. This might look like

<funcdesc id='capitalize'>
  <signature>
    <name>capitalize</name>
    <args>word</args>
  </signature>
  <description>
<para>Capitalize the first character of the argument.</para>
  </description>
</funcdesc>

   A script would need to check whether these are truly unique, and
   whether they are present in all places (and assign them if they
   aren't). I assume they can be used for cross-referencing, also.

c) some sort of versioning is used in the translations. It is not
   clear to me what the best approach would be, options include:
   - attribute each element with an ID also with the CVS version
     number where this element was last changed.
   - attribute each such element with a hash value for its contents.

Regards,
Martin