[I18n-sig] Re: Patch 101320: doc strings

François Pinard pinard@iro.umontreal.ca
04 Sep 2000 12:59:27 -0400

[Martin von Loewis]

> I have started translating the Python doc strings into German, and
> covered about 30% so far.  Using the Python 2 gettext.py, I did not
> experience any noticable delay in loading the mo file, on my 300MHz
> machine.  While I agree that lazy loading may become necessary, I think
> it is ok to do implement the feature when the problem actually arises.
> I'm pretty certain you can implement lazy access without changing the
> existing API.

Excellent.  You just have to remember this, if you ever read someone asking
that we split textual domains in "smaller" or "more manageable" parts.
We should then correct implementations and tools as needed, rather than
give in splitting, or multiplying textual domains.

> The Python 2 mmap works on Unix and Win32.  It probably is the best
> solution if available.

Wow!  Good news.  Our luck would be that it works on Macintosh as well...

> > In my opinion, the solution might then be for these systems to load
> > the MO hash tables only, and then retrieve messages from disk.

> If you load the hash tables, does this give enough information so that
> you can use two seek(2) calls only; on average? If so, it would be
> probably good if there was a) documentation for the hash table format,
> and/or b) an implementation of it in Python.

We could use the compendium of all existing PO files in the Translation
Project to establish statistics (I'm not rushing in doing this today! :-).
My guess is that we could hold full hash tables in memory through a quick
swallow, and that double hashing would later guarantee a single seek on
the average.

The precise hash algorithm is only documented in the sources.  Using it
from GNU `gettext' would raise question about how the GPL applies, but
using the copy bought by the Danish UUG should be OK.  Best might be to
postpone for now, according to the first quoted paragraph of this message.

> > The last fear might be that the POT file might be too big for
> > translators to handle.

> That indeed is my concern.  The largest catalog so far was Lynx
> (AFAICT), with 1100 messages.  I guess gcc might also be pretty large.

It should be the concern of translators, national teams, or the
Translation Project, but surely not the concern of programmers in the sake
of translators.  At the start of the Translation Project, it has been a
recurrent difficulty that each and every programmer needed to decide how
translators should work.  Better to keep responsibility well separated:
everybody sleeps better, and is happier in the long run.

> I think for the Python docstring catalog, we can give some guidance -
> perhaps by shipping not all at once, but waiting for translators to
> complete with the most interesting things first (like docstrings for
> the builtin core functions).

No, no, I don't think so.  As programmers, we should just not interfere.
Believe me, people do not need so much of our precious "guidance".

> I'm certain it will take some time to get translations back, so if 
> we want to have something in the next release (after 2.0), we should
> start today.

This is another thing.  You have to loose hope, _right now_, to ever keep
all translations synchronous with releases.  Some teams, and only a few of
them, react in a fast way, but most teams are slow.  You will live endless
irritation, and might end up pretty disgusted, if you start trying to push
and pull on teams.  You have to peace down your own soul, and become quiet.

Consider, as a programmer, that your job is to internationalise your scripts,
(and maybe comply, once in a while, when you receive reports about burning
too much of English grammar in your run-time construction of strings),
and then accepting translations from translators, almost blindly, without
judging if they are worth being distributed or not.  Linguistics matters
is something to be discussed and resolved between the national team of
translators for a language, and the users of that language.  You have to
detach yourself, as a programmer, of all such concerns: they are not yours.
The quality of your package is orthogonal and independent from the quality
of translations, this should be absolutely clear for everybody.

François Pinard   http://www.iro.umontreal.ca/~pinard