[Python-Dev] The "i" string-prefix: I18n'ed strings

Martin Blais blais at furius.ca
Thu Apr 6 17:48:13 CEST 2006


Hi all

I got an evil idea for Python this morning -- Guido: no, it's not
about linked lists :-) -- , and I'd  like to bounce it here.  But
first, a bit of context.


In the context of writing i18n apps, programmers have to "mark"
strings that may be internationalized in a way that

- a special hook gets called at runtime to perform the lookup in a
catalog of translations setup for a specific language;

- they can be extracted by an external tool to produce the keys of all
the catalogs, so that translators can update the list of keys to
translate and produce the values in the target languages.

Usually, you bind a function to a short name, like _() and N_(), and
it looks kind-of like this::

    _("My string to translate.")

    or

    N_("This is marked for translation") # N_() is a noop.

pygettext does the work of extracting those patterns from the files,
doing all the parsing manually, i..e it does not use runtime Python
introspection to do this at all, it is simply a simple text parsing
algorithm (which works pretty well).  I'm simplifying things a bit,
but that is the jist of how it works, for those not familiar with
i18n.


This morning I woke up staring at the ceiling and the only thing in my
mind was "my web app code is ugly".  I had visions of LISP parentheses
with constructs like

   ...
   A(P(_("Click here to forget"), href="...
   ...

(In my example, I built a library not unlike stan for creating HTML,
which is where classes A and P come from.)  I find the i18n markup a
bit annoying, especially when there are many i18n strings close
together.  My point is: adding parentheses around almost all strings
gets tiresome and "charges" the otherwise divine esthetics of Python
source code.

(Okie, that's enough for context.)


So I had the following idea: would it not be nice if there existed a
string-prefix 'i' -- a string prefix like for the raw (r'...') and
unicode (u'...') strings -- that would mark the string as being for
i18n?   Something like this (reusing my example above)::

   A(P(i"Click here to forget", href="...

Notes:

- We could then use the spiffy new AST to build a better parser to
extract those strings from the source code as well.

- We could also have a prefix "I" for strings to be marked but not
runtime-translated, to replace the N_() strings.

- This implies that we would have to introduce some way for these
strings to call a custom function at runtime.

- My impression is that this process of i18n is common enough that it
does not "move" very much, and that there aren't 18 ways to go about
it either, so that it would be reasonable to consider adding it to the
language.   This may be completely wrong, I am by no means an i18n
expert, please show if this is not the case.

- Potential issue: do we still need other prefixes when 'i' is used,
and if so, how do we combine them...


Okay, let's push it further a bit:  how about if there was some kind
of generic mechanism built-in in Python for adding new string-prefixes
which invoke callbacks when the string with the prefix is evaluated? 
This could be used to implement what I'm suggesting above, and beyond.
 Something like this::

   import i18n
   i18n.register_string_prefix('i', _)
   i18n.register_string_prefix('I', N_)

I'm not sure what else we might be able to do with this, you may have
other useful ideas.


Any comments welcome.

cheers,


More information about the Python-Dev mailing list