[XML-SIG] PyXML XPath woes
Matt Patterson
list-matt at reprocessed.org
Wed Feb 11 04:04:48 EST 2004
On 8 Feb 2004, at 06:18, Mike Brown wrote:
> Some XPath hints for you here...
>
> 1. These predicates don't have to be chained. For example, instead of
> <snip>
> you could just say
>
> [not(ancestor::boxtexttable or ancestor::casestudy or
> ancestor::casetexttable
> or ancestor::checklist) and name(preceding-sibling::*[1]) != 'H1']
Aha. The original reason for such hideously complex predicates was that
I had a view to generating the XPath from a much simpler source, or
maybe even a little app, so I wanted to keep things as crude as
possible. That's all gone out of the window now...
> 2. count(preceding-sibling::*) = 0
>
> is more succinctly written as not(preceding-sibling::*)
> However I think this predicate may have been interfering with your
> results.
> Take your first H2, for example... it does have a preceding sibling:
> standfirst, but you did in fact want it to be recorded as a boundary,
> right?
That's why there were two lines matching H2, one without a preceding
sibling predicate and one with...
> I am guessing that your boundary elements are those H1, H2 and H3
> elements
> that are not descendants of boxtexttable, casestudy, casetextable, or
> checklist elements, and that are not immediately preceded by a
> higher-level
> heading element (H1 being higher than H2 being higher than H3).
Exactly right.
> This expression is much simpler and I think will do what you want:
>
> (//H1|//H2|//H3)[not(ancestor::boxtexttable or
> ancestor::casestudy or
> ancestor::casetexttable or
> ancestor::checklist) and
>
> not(starts-with(local-name(preceding-sibling::*[1]),'H'))]
Blimey, just a little simpler... :)
> If efficiency is critical, I'd look into other mechanisms involving a
> single
> pass through the tree. For example, this XSLT stylesheet, which does a
> recursive copy-through ("identity transform", see XSLT spec under
> Copying) is
> far more efficient for what you want to do, which is generate a new
> document
> that has a boundary="true" attribute added to the appropriate elements:
I originally had wanted to avoid an intermediate XSL pass of the
content, but it became unavoidable: I am already doing an XSL pass, and
just moving the whole thing boundary finding thing into XSL sounds
mighty sensible to me.
Thanks for all your advice: very helpful indeed!
Best,
Matt
--
Matt Patterson | Typographer
<matt at emdash.co.uk> | http://www.emdash.co.uk/
<matt at reprocessed.org> | http://reprocessed.org/
More information about the XML-SIG
mailing list