[XML-SIG] PyXML XPath woes

Matt Patterson list-matt at reprocessed.org
Wed Feb 11 04:04:48 EST 2004


On 8 Feb 2004, at 06:18, Mike Brown wrote:

> Some XPath hints for you here...
>
> 1. These predicates don't have to be chained. For example, instead of
> <snip>
> you could just say
>
> [not(ancestor::boxtexttable or ancestor::casestudy or 
> ancestor::casetexttable
> or ancestor::checklist) and name(preceding-sibling::*[1]) != 'H1']

Aha. The original reason for such hideously complex predicates was that 
I had a view to generating the XPath from a much simpler source, or 
maybe even a little app, so I wanted to keep things as crude as 
possible. That's all gone out of the window now...

> 2. count(preceding-sibling::*) = 0
>
> is more succinctly written as not(preceding-sibling::*)
> However I think this predicate may have been interfering with your 
> results.
> Take your first H2, for example... it does have a preceding sibling:
> standfirst, but you did in fact want it to be recorded as a boundary,
> right?

That's why there were two lines matching H2, one without a preceding 
sibling predicate and one with...

> I am guessing that your boundary elements are those H1, H2 and H3 
> elements
> that are not descendants of boxtexttable, casestudy, casetextable, or
> checklist elements, and that are not immediately preceded by a 
> higher-level
> heading element (H1 being higher than H2 being higher than H3).

Exactly right.

> This expression is much simpler and I think will do what you want:
>
> (//H1|//H2|//H3)[not(ancestor::boxtexttable or
>                      ancestor::casestudy or
>                      ancestor::casetexttable or
>                      ancestor::checklist) and
>                  
> not(starts-with(local-name(preceding-sibling::*[1]),'H'))]

Blimey, just a little simpler... :)

> If efficiency is critical, I'd look into other mechanisms involving a 
> single
> pass through the tree. For example, this XSLT stylesheet, which does a
> recursive copy-through ("identity transform", see XSLT spec under 
> Copying) is
> far more efficient for what you want to do, which is generate a new 
> document
> that has a boundary="true" attribute added to the appropriate elements:

I originally had wanted to avoid an intermediate XSL pass of the 
content, but it became unavoidable: I am already doing an XSL pass, and 
just moving the whole thing boundary finding thing into XSL sounds 
mighty sensible to me.

Thanks for all your advice: very helpful indeed!

Best,

Matt


-- 
    Matt Patterson | Typographer
    <matt at emdash.co.uk> | http://www.emdash.co.uk/
    <matt at reprocessed.org> | http://reprocessed.org/




More information about the XML-SIG mailing list