Hi, This is a bit late, but thanks for the response. I am playing around with iterparse() and am following the advice you gave. I have a question though: I could not find a way to consume an element and replace it with just text. For example <i18n:text>hello</ i18n:text> when found in the middle of a paragraph will be replaced by text. The replace() method requires the replacement to be an element. Is this possible? Thanks! Alexis Georges On 28-Apr-09, at 1:59 PM, Stefan Behnel wrote:
Hi,
Alexis Georges wrote:
I am maintaining a multilingual website which works with XML, XSLT to generate XHTML.
I am working with Apache Cocoon (http://cocoon.apache.org/2.1/) using (among other things) their I18NTransformer. Basically I can use elements in the I18N (http://apache.org/cocoon/i18n/2.1) namespace, and then tell Cocoon to apply the I18NTransfomer to the document; this replaces the I18N elements with a localized value (eg. a formatted date/number, a translated label/attribute, etc...).
I have been looking at lxml a little bit to see if I could move to a Python-based framework for the website. I am not quite sure how to go about the I18N part though.
Using the Babel library (http://babel.edgewall.org/) along with request headers to generate localized data, I have everything I need. What is missing is the "parser" for the I18N elements. All I can think of right now is to implement a SAX parser, the way Cocoon does (in Java).
There is a SAX-like interface in lxml.etree, called "target parser".
However, if your documents fit into memory, using iterparse() is a lot simpler (and likely not even much slower).
Something like this might work:
context = etree.iterparse( "somefile.xml", tag = "{http://apache.org/cocoon/i18n/2.1}*")
for event, i18n_element in context: new_element = get_i18n_replacement_for(i18n_element) i18n_element.getparent().replace(i18n_element, new_element)
context.getroottree().write("newfile.xml")
See here for some documentation:
http://codespeak.net/lxml/parsing.html
You can also achieve the same thing in XSLT, or using XPath, or ...
Stefan