
Hi, Alexis Georges wrote:
I am maintaining a multilingual website which works with XML, XSLT to generate XHTML.
I am working with Apache Cocoon (http://cocoon.apache.org/2.1/) using (among other things) their I18NTransformer. Basically I can use elements in the I18N (http://apache.org/cocoon/i18n/2.1) namespace, and then tell Cocoon to apply the I18NTransfomer to the document; this replaces the I18N elements with a localized value (eg. a formatted date/number, a translated label/attribute, etc...).
I have been looking at lxml a little bit to see if I could move to a Python-based framework for the website. I am not quite sure how to go about the I18N part though.
Using the Babel library (http://babel.edgewall.org/) along with request headers to generate localized data, I have everything I need. What is missing is the "parser" for the I18N elements. All I can think of right now is to implement a SAX parser, the way Cocoon does (in Java).
There is a SAX-like interface in lxml.etree, called "target parser". However, if your documents fit into memory, using iterparse() is a lot simpler (and likely not even much slower). Something like this might work: context = etree.iterparse( "somefile.xml", tag = "{http://apache.org/cocoon/i18n/2.1}*") for event, i18n_element in context: new_element = get_i18n_replacement_for(i18n_element) i18n_element.getparent().replace(i18n_element, new_element) context.getroottree().write("newfile.xml") See here for some documentation: http://codespeak.net/lxml/parsing.html You can also achieve the same thing in XSLT, or using XPath, or ... Stefan