Marking translatable strings

Thu Sep 16 11:01:54 EDT 1999

In article <P07E3.180$xm1.51192 at news.direcpc.com>, Emile van Sebille
<emile at fenx.com> writes
>Hi Francois,
>
....
>If a change to the language is in the offing, why not introduce
>something like a t"TEXT" structure that behaves as r"..." but also
>serves as a marker.  Then we'd have 12 ways to express a quoted string.
>;-)
>
More generally wouldn't one need to allow for quoted strings prefixed by
either nothing, r or t or rt or tr and if that's the case why stop at
only three prefix symbols (counting empty).
>Otherwise, It sounds interesting.  Where can I find out more about the
>project?
>
>Regards,
>--
>
>Emile van Sebille
>emile at fenx.com
>-------------------
>
>
>François Pinard <pinard at iro.umontreal.ca> wrote in message
>news:oq4sgvf0m1.fsf at titan.progiciels-bpi.ca...
>> Hello, people.
>>
>> I had a real strange idea :-).  I first quickly dismiss it, but it is
>so
>> simple that I prefer to ponder it again, and share and debate it a
>little
>> first, maybe.
>>
>> My friends know that I've been working at software
>internationalisation for
>> many years now, with a stress on program messages.  Of course, I want
>to
>> include Python in the realm of my possibilities, and myself start
>writing
>> internationalised scripts soon, in such a way that everything links
>nicely
>> with the Translation Project.
>>
>> So, I want the big picture right now.  That is: a technique for
>marking
>> strings for automatic extraction and building of PO files, and a
>technique
>> for using PO files from within Python scripts.  I foresee that Python
>> introduces an usual difficulty in that the textual domain for
>translations
>> may vary quite unexpectedly, when the control dynamically flies
>between
>> independent packages under different textual domains.
>>
>> I started a discussion with Guido about this, but I'm a slow thinker,
>and
>> would not like to rush things before feeling rather solid, as Guido's
>time
>> is precious.  But on this forum, I thought I could dare exploring
>ideas,
>> asking for your forgiveness for any blunder I could make while
>thinking. :-)
>>
>> 1) Marking strings
>>    ---------------
>>
>> There are many circumstances where strings translation could be
>delayed
>> from where they textually occur, and that language syntactic
>considerations
>> could make the marking difficult in a few cases.  In C, there is a
>> pre-processor between the sources and the compiler, so it is easy to
>introduce
>> an identity macro which special name is recognised by the string
>extractor,
>> and which vanishes before the compiler sees it.  We use:
>>
>>    #define N_(Text) Text
>>
>> for that purpose.  In languages where such preprocessing is difficult,
>like
>> for `ksh' or `bash', strings are especially marked in the syntax, like
>in:
>>
>>    $"translatable text"
>>
>> but this requires a modification to the interpreter.  Emacs, with
>`defsubst',
>> also allows for macro expansion, and we then use tricks as in C
>(anywhere
>> except for doc strings).  Some other flavours of LISP are also open to
>> such tricks.
>>
>> Python has no preprocessing, no special string syntax for markability,
>> and moreover, it has doc strings!  So, at first glance, it looks
>difficult.
>> However, and this is where my strange idea comes to play, it has eight
>> type of strings: ', ", ''', """, r', r", r''' and r"""; and I thought
>> that maybe we could just discipline ourselves to give more meaning to
>all
>> these differences, since after all, if we except some ending backslash
>> considerations, all eight types are equally capable of representing
>> any string.
>>
>> I'm a strange, anal man, who needs a reason behind the slightest
>choices,
>> and believe me, eight types of strings gave me a lot of food for this
>> mania, all along while writing.  I'm still exploring! :-) Yet, after
>having
>> played with Python for almost 10 days, now, I came to realise that I'm
>more
>> naturally tempted to stick to the 'TEXT' notation for computer strings
>and
>> "TEXT" notation for human strings, the reasons being that there might
>> be a lot of apostrophes in human text, and that traditionally, we more
>> usually quote sentences with "TEXT", while we quote words with `TEXT'
>> (note the grave accent at the left).
>>
>> So, the bizarre idea I got is that one could be to formalize this into
>> a rule: strings of type ", """, r" and r""" could be all markable as
>> translatable, while strings of type ', ''', r' and r''' would not be.
>> On the other hand, this might be overkill, as maybe people are used to
>> freely mix types ' and ", and this change could be seen as stressful.
>> Could we choose better?
>>
>> Surely, since doc strings use """ exclusively, there is no choice as
>to retain
>> type """ for translability, wherever it appears.  However, forcing the
>use
>> of """ everywhere we want translatibility is an overhead of four
>characters
>> (just compare "TEXT" with """TEXT"""), while C use three or four
>characters
>> (compare "TEXT" with _("TEXT") or N_("TEXT"), and bash uses only one
>(compare
>> "TEXT" with $"TEXT").  I would like Python to be as comfortable as
>possible.
>> If I could plainly use "TEXT" instead of 'TEXT' to mark
>translatability,
>> I would have an overhead of zero characters, which would be better
>than
>> everything, but I'm not sure if this constraint would be acceptable to
>> Python writers.
>>
>> Another possibility is to use ''"TEXT" instead of "TEXT", making an
>overhead
>> of two characters: that is the compile time concatenation of '' with
>"TEXT".
>> This combination is quite unlikely to me, and a bit uglier.
>>
>> 2) Translating strings
>>    -------------------
>>
>> (Oops, I just received a phone call forcing me to leave fairly soon,
>so I
>> have to be very concise for the remainder of this message.  Let's
>rather
>> develop these in the possible thread that might follow from this
>message.)
>>
>> What would be the most comfortable for me, short of having the Python
>> interpreter modified, is to merely use a function to force the actual
>> translation of a string.  The most comfortable (the less intrusive)
>way
>> would be to call:
>>
>>         _(TEXT)
>>
>> to get the translation of text.  It resembles C, but it overloads `_',
>> which already has a preset meaning, interactively.  If I could push
>the
>> preset `_' somewhere else, maybe on `__', I would do it and reserve
>`_'
>> for translation, which would be much, much more common in the long
>run.
>>
>> Using a function would allow us to build the whole translation chain
>> (administrating the translations with teams, etc.), yet if the syntax
>> could be relieved with the help of Guido, I guess this would be
>welcome.
>> We might need to experiment first.
>>
>> 3) Setting the textual domain
>>    --------------------------
>>
>> In a quick word, I guess that this problem could be fairly easily
>solved
>> through the handy scope rules for resolution of names in Python.  Each
>module
>> could have a standard global variable name setting which textual
>domain to
>> use within it.  So, even with the control flying like hell between
>modules,
>> it would not be a problem on average.  But there are problematic
>cases,
>> like for when untranslated strings are transmitted to other modules,
>for
>> being translated there, or even maybe for plain doc strings.  This
>requires
>> good thought.  This problem is more difficult that many might thing at
>first.
>>
>> OK, I have to rush away now.  Thanks for listening! :-)
>>
>> --
>> François Pinard   http://www.iro.umontreal.ca/~pinard
>>
>>
>>
>
>

-- 
Robin Becker