Re: [lxml-dev] XML Documents & I18N (the way Cocoon does it)

April 28, 2009

      Hi,

Alexis Georges wrote:
...
I am maintaining a multilingual website which works with XML, XSLT to
generate XHTML.
I am working with Apache Cocoon (http://cocoon.apache.org/2.1/) using
(among other things) their I18NTransformer. Basically I can use elements
in the I18N (http://apache.org/cocoon/i18n/2.1) namespace, and then tell
Cocoon to apply the I18NTransfomer to the document; this replaces the
I18N elements with a localized value (eg. a formatted date/number, a
translated label/attribute, etc...).
I have been looking at lxml a little bit to see if I could move to a
Python-based framework for the website. I am not quite sure how to go
about the I18N part though.
Using the Babel library (http://babel.edgewall.org/) along with request
headers to generate localized data, I have everything I need. What is
missing is the "parser" for the I18N elements. All I can think of right
now is to implement a SAX parser, the way Cocoon does (in Java).
There is a SAX-like interface in lxml.etree, called "target parser".

However, if your documents fit into memory, using iterparse() is a lot
simpler (and likely not even much slower).

Something like this might work:

     context = etree.iterparse(
              "somefile.xml",
              tag = "{http://apache.org/cocoon/i18n/2.1}*")

     for event, i18n_element in context:
         new_element = get_i18n_replacement_for(i18n_element)
         i18n_element.getparent().replace(i18n_element, new_element)

     context.getroottree().write("newfile.xml")

See here for some documentation:

http://codespeak.net/lxml/parsing.html

You can also achieve the same thing in XSLT, or using XPath, or ...

Stefan

Re: [lxml-dev] XML Documents & I18N (the way Cocoon does it)

Stefan Behnel