Mailman 3 [lxml-dev] Spacing and the presence of xml:space="preserve" - lxml - The Python XML Toolkit

24 Mar 2009

      Hallo all

We are currently using this expression to obtain a plain text version
inside a node:

For example:
...
...
...
from lxml import etree
>>> etree.XPath("string()")
string_xpath(etree.fromstring("<a>  asdf  <b/>fdsa  </a>"))
'  asdf  fdsa  '
This works great and returns the string assuming xml:space="preserve",
in other words, spacing is taken verbatim. We work on a file format
where some of the spacing is very important (XLIFF). We generate such
files with xml:space="preserve" in the necessary places. Not everybody
generates such files, unfortunately, so we need to also handle the
normalised versions. If I rather use the XPath function
"normalize-space()", I can get the normalised spacing:
'asdf fdsa'

but unfortunately it does this even if xml:space="preserve" is set:

>>> etree.XPath("normalize-space()")
...
...
...
string_xpath(etree.fromstring('''<a xml:space="preserve">  asdf  <b/>fdsa  </a>'''))
'asdf fdsa'
Unfortunately, I don't see a way to get the correct version (normalised
by default, but with white-space preserved if xml:space="preserved" is
set). Do I have to handle the cases separately, or is there a way for
lxml to help me by just doing the right thing?  I could special case on
the node, but it would be a bit harder to know if some xml:space
directive was given higher up in the tree. Or am I missing something in
XPath / lxml?

Any help would be appreciated.

Friedel Wolff

--
Recently on my blog:
http://translate.org.za/blogs/friedel/en/content/video-virtaals-functionalit...

[lxml-dev] Spacing and the presence of xml:space="preserve"

F Wolff

Stefan Behnel

tags

participants (2)