[XML-SIG] PyXML XPath woes
list-matt at reprocessed.org
Wed Feb 11 09:35:28 EST 2004
On 8 Feb 2004, at 02:49, Thomas B. Passin wrote:
> Matt Patterson wrote:
>> I've got an XML file in which I want to locate all elements with the
>> attribute boundary set to 'true'. I use the following XPath with
>> like so:
>> boundaryFinder = Compile("//*[@boundary='true']")
>> context = Context(self.document)
>> # evaluate the expression and get a nodeList
>> boundaryNodes = boundaryFinder.evaluate(context)
>> But the results of the XPath do not return all the nodes which match!
> How many nodes did you get and how many are actually there?
Okay, this is weird: the XPath _is_ returning 47 nodes, as it should. I
should have checked closer: I thought that the XPath was returning
wrongly because in the final paginated output were pages which had
several elements with boundary="true" attributes, but I panicked and
assumed that XPath was to blame.
This seems to have been caused by heinous problems with the 4DOM DOM
Range implementation: either the range is storing its boundary points
very strangely, or the range.cloneContents() method is simply bonkers.
The range causing the problem has start and end points with the same
parent, and the cloneContents() method is returning that all of that
parent node's children. I had to do some poking, but it seems that when
the start point of a range is the child of the range's common ancestor
node and the end point is a grand-child or greater descendant then the
cloneContents() method of the range returns all preceding-siblings of
the last part of the end-point's ancestor chain:
If the start and end-points of a range were set to <start/> and <end/>
Then range.cloneContents() would return:
Which is clearly bonkers.
> You have an encoding problem with the file you linked to. It is
> encoded in iso-8859-1 but with no encoding declaration it is treated
> as utf-8. Unfortunately there are some non-utf-8 characters in it, so
> it is not well-formed. Thus any results you get would be suspect. In
> fact, it should not parse sucessfully at all
Hmmm. The Frame XML file (and thus it's entities) claim to be utf-8,
and I've had no problems with them. It could be a file-transfer issue,
I suppose. I'll have to investigate closer. Thanks for the heads up!
Thanks for all your help,
Matt Patterson | Typographer
<matt at emdash.co.uk> | http://www.emdash.co.uk/
<matt at reprocessed.org> | http://reprocessed.org/
More information about the XML-SIG