Marking translatable strings

Thu Sep 16 10:11:52 EDT 1999

Hi Francois,

Two quick thoughts on the text marker:

<snip>

> "TEXT" with $"TEXT").  I would like Python to be as comfortable as
possible.
> If I could plainly use "TEXT" instead of 'TEXT' to mark
translatability,

When I build sql commands, I commonly use both " and ', one for quoting
my command, and the other for quoting within the command.

> Another possibility is to use ''"TEXT" instead of "TEXT", making an
overhead
> of two characters: that is the compile time concatenation of '' with
"TEXT".
> This combination is quite unlikely to me, and a bit uglier.

I like this better, as it can immediately be recognized as a marker.

Perhaps the marker could vary by package/module and be set by a
(programmatic/human) parser at the package/module level.  Short of a
formal change in the language, this may allow existing and
in-development programs and take advantage of translation
opportunities/requirements.

If a change to the language is in the offing, why not introduce
something like a t"TEXT" structure that behaves as r"..." but also
serves as a marker.  Then we'd have 12 ways to express a quoted string.
;-)

Otherwise, It sounds interesting.  Where can I find out more about the
project?

Regards,
--

Emile van Sebille
emile at fenx.com
-------------------

François Pinard <pinard at iro.umontreal.ca> wrote in message
news:oq4sgvf0m1.fsf at titan.progiciels-bpi.ca...
> Hello, people.
>
> I had a real strange idea :-).  I first quickly dismiss it, but it is
so
> simple that I prefer to ponder it again, and share and debate it a
little
> first, maybe.
>
> My friends know that I've been working at software
internationalisation for
> many years now, with a stress on program messages.  Of course, I want
to
> include Python in the realm of my possibilities, and myself start
writing
> internationalised scripts soon, in such a way that everything links
nicely
> with the Translation Project.
>
> So, I want the big picture right now.  That is: a technique for
marking
> strings for automatic extraction and building of PO files, and a
technique
> for using PO files from within Python scripts.  I foresee that Python
> introduces an usual difficulty in that the textual domain for
translations
> may vary quite unexpectedly, when the control dynamically flies
between
> independent packages under different textual domains.
>
> I started a discussion with Guido about this, but I'm a slow thinker,
and
> would not like to rush things before feeling rather solid, as Guido's
time
> is precious.  But on this forum, I thought I could dare exploring
ideas,
> asking for your forgiveness for any blunder I could make while
thinking. :-)
>
> 1) Marking strings
>    ---------------
>
> There are many circumstances where strings translation could be
delayed
> from where they textually occur, and that language syntactic
considerations
> could make the marking difficult in a few cases.  In C, there is a
> pre-processor between the sources and the compiler, so it is easy to
introduce
> an identity macro which special name is recognised by the string
extractor,
> and which vanishes before the compiler sees it.  We use:
>
>    #define N_(Text) Text
>
> for that purpose.  In languages where such preprocessing is difficult,
like
> for `ksh' or `bash', strings are especially marked in the syntax, like
in:
>
>    $"translatable text"
>
> but this requires a modification to the interpreter.  Emacs, with
`defsubst',
> also allows for macro expansion, and we then use tricks as in C
(anywhere
> except for doc strings).  Some other flavours of LISP are also open to
> such tricks.
>
> Python has no preprocessing, no special string syntax for markability,
> and moreover, it has doc strings!  So, at first glance, it looks
difficult.
> However, and this is where my strange idea comes to play, it has eight
> type of strings: ', ", ''', """, r', r", r''' and r"""; and I thought
> that maybe we could just discipline ourselves to give more meaning to
all
> these differences, since after all, if we except some ending backslash
> considerations, all eight types are equally capable of representing
> any string.
>
> I'm a strange, anal man, who needs a reason behind the slightest
choices,
> and believe me, eight types of strings gave me a lot of food for this
> mania, all along while writing.  I'm still exploring! :-) Yet, after
having
> played with Python for almost 10 days, now, I came to realise that I'm
more
> naturally tempted to stick to the 'TEXT' notation for computer strings
and
> "TEXT" notation for human strings, the reasons being that there might
> be a lot of apostrophes in human text, and that traditionally, we more
> usually quote sentences with "TEXT", while we quote words with `TEXT'
> (note the grave accent at the left).
>
> So, the bizarre idea I got is that one could be to formalize this into
> a rule: strings of type ", """, r" and r""" could be all markable as
> translatable, while strings of type ', ''', r' and r''' would not be.
> On the other hand, this might be overkill, as maybe people are used to
> freely mix types ' and ", and this change could be seen as stressful.
> Could we choose better?
>
> Surely, since doc strings use """ exclusively, there is no choice as
to retain
> type """ for translability, wherever it appears.  However, forcing the
use
> of """ everywhere we want translatibility is an overhead of four
characters
> (just compare "TEXT" with """TEXT"""), while C use three or four
characters
> (compare "TEXT" with _("TEXT") or N_("TEXT"), and bash uses only one
(compare
> "TEXT" with $"TEXT").  I would like Python to be as comfortable as
possible.
> If I could plainly use "TEXT" instead of 'TEXT' to mark
translatability,
> I would have an overhead of zero characters, which would be better
than
> everything, but I'm not sure if this constraint would be acceptable to
> Python writers.
>
> Another possibility is to use ''"TEXT" instead of "TEXT", making an
overhead
> of two characters: that is the compile time concatenation of '' with
"TEXT".
> This combination is quite unlikely to me, and a bit uglier.
>
> 2) Translating strings
>    -------------------
>
> (Oops, I just received a phone call forcing me to leave fairly soon,
so I
> have to be very concise for the remainder of this message.  Let's
rather
> develop these in the possible thread that might follow from this
message.)
>
> What would be the most comfortable for me, short of having the Python
> interpreter modified, is to merely use a function to force the actual
> translation of a string.  The most comfortable (the less intrusive)
way
> would be to call:
>
>         _(TEXT)
>
> to get the translation of text.  It resembles C, but it overloads `_',
> which already has a preset meaning, interactively.  If I could push
the
> preset `_' somewhere else, maybe on `__', I would do it and reserve
`_'
> for translation, which would be much, much more common in the long
run.
>
> Using a function would allow us to build the whole translation chain
> (administrating the translations with teams, etc.), yet if the syntax
> could be relieved with the help of Guido, I guess this would be
welcome.
> We might need to experiment first.
>
> 3) Setting the textual domain
>    --------------------------
>
> In a quick word, I guess that this problem could be fairly easily
solved
> through the handy scope rules for resolution of names in Python.  Each
module
> could have a standard global variable name setting which textual
domain to
> use within it.  So, even with the control flying like hell between
modules,
> it would not be a problem on average.  But there are problematic
cases,
> like for when untranslated strings are transmitted to other modules,
for
> being translated there, or even maybe for plain doc strings.  This
requires
> good thought.  This problem is more difficult that many might thing at
first.
>
> OK, I have to rush away now.  Thanks for listening! :-)
>
> --
> François Pinard   http://www.iro.umontreal.ca/~pinard
>
>
>