[XML-SIG] XPath's reliance on id()

14 Mar 2002 18:43:12 +0100

Martijn Faassen <faassen@vet.uu.nl> writes:

> > ??? x,y ??? Python-Object: x = y ??? hash(x) = hash(y)
> 
> This seems to have gotten somewhat mangled here; I get three question
> marks and this must've been some symbol?

This was an attempt to put FOR ALL, ELEMENT OF, and RIGHT ARROW into
an email message; it seems it failed.

> Anyway, perhaps the notion of equality is what we need; in my mind two
> objects can stand in for the same DOM node but not be the same object;
> they're equal but not identical.

Strictly speaking, the DOM spec does not guarantee equality of nodes.
If anything, it guarantees that identity works.

> The notion for equality in DOM nodes is actually supported by the 
> DOM level 3 working draft:
> 
> """
> isSameNode (introduced in DOM Level 3)

It is the notion of "sameness" that is supported. The Python mapping
could mandate that == for nodes holds iff isSameNode holds, but it
currently doesn't.

Notice that they also have isEqualNode; this is *not* what we want.

> Then again, I just found out they have a compareTreePosition()
> method added to the Node interface that we could use for sorting
> purposes, I think..

Indeed. Then it would be up to the DOM implementation to make that
happen. This sounds like the cleanest approach to me.

> But that is in fact what is needed in this case; I have many different
> proxy objects which may all map to the same actual DOM node, so they'd
> have the same __hash__. But perhaps the other implications of __hash__
> break that. What about supplying a 'key' attribute, anticipating DOM 
> level 3 vague implications? :)

If we mandate DOM3 features, I think we should use the feature that
apparently was explicitly added for XSLT document order:
compareTreePosition.

> I don't think it's reasonable to give those inner nodes the same
> hash value at all. They're not the same node, and shouldn't hash the
> same way.

They are equal nodes (in the sense of isEqualNode), so I see no reason
why the hashes should be different. If I was to implement a hash of a
node, I'd use the formula

def hash(node):
  res = hash(node.nodeType)+hash(node.nodeName)
  for c in node.childNodes:
    res += hash(c)
  return res

> I don't see any reason to make two different nodes hash the same way just
> because they have the same name. 

They have the same name, the same type, and the same content. They
really are equal.

Regards,
Martin