[XML-SIG] PyXML XPath woes
list-matt at reprocessed.org
Wed Feb 11 04:04:48 EST 2004
On 8 Feb 2004, at 06:18, Mike Brown wrote:
> Some XPath hints for you here...
> 1. These predicates don't have to be chained. For example, instead of
> you could just say
> [not(ancestor::boxtexttable or ancestor::casestudy or
> or ancestor::checklist) and name(preceding-sibling::*) != 'H1']
Aha. The original reason for such hideously complex predicates was that
I had a view to generating the XPath from a much simpler source, or
maybe even a little app, so I wanted to keep things as crude as
possible. That's all gone out of the window now...
> 2. count(preceding-sibling::*) = 0
> is more succinctly written as not(preceding-sibling::*)
> However I think this predicate may have been interfering with your
> Take your first H2, for example... it does have a preceding sibling:
> standfirst, but you did in fact want it to be recorded as a boundary,
That's why there were two lines matching H2, one without a preceding
sibling predicate and one with...
> I am guessing that your boundary elements are those H1, H2 and H3
> that are not descendants of boxtexttable, casestudy, casetextable, or
> checklist elements, and that are not immediately preceded by a
> heading element (H1 being higher than H2 being higher than H3).
> This expression is much simpler and I think will do what you want:
> (//H1|//H2|//H3)[not(ancestor::boxtexttable or
> ancestor::casestudy or
> ancestor::casetexttable or
> ancestor::checklist) and
Blimey, just a little simpler... :)
> If efficiency is critical, I'd look into other mechanisms involving a
> pass through the tree. For example, this XSLT stylesheet, which does a
> recursive copy-through ("identity transform", see XSLT spec under
> Copying) is
> far more efficient for what you want to do, which is generate a new
> that has a boundary="true" attribute added to the appropriate elements:
I originally had wanted to avoid an intermediate XSL pass of the
content, but it became unavoidable: I am already doing an XSL pass, and
just moving the whole thing boundary finding thing into XSL sounds
mighty sensible to me.
Thanks for all your advice: very helpful indeed!
Matt Patterson | Typographer
<matt at emdash.co.uk> | http://www.emdash.co.uk/
<matt at reprocessed.org> | http://reprocessed.org/
More information about the XML-SIG