[XML-SIG] XPath's reliance on id()

Martijn Faassen faassen@vet.uu.nl
Thu, 14 Mar 2002 13:27:45 +0100


Martin v. Loewis wrote:
> Martijn Faassen <faassen@vet.uu.nl> writes:
> 
> > > I think this approach (for determining document order) really does
> > > need the notion of identity, so it's hard to believe that this could
> > > actually work.
> > 
> > Doesn't __hash__ imply there is a notion of identity?
> 
> No, it implies equality. Actually, it is the other way 'round:
> equality implies equal hash values:
> 
> ??? x,y ??? Python-Object: x = y ??? hash(x) = hash(y)

This seems to have gotten somewhat mangled here; I get three question
marks and this must've been some symbol?

Anyway, perhaps the notion of equality is what we need; in my mind two
objects can stand in for the same DOM node but not be the same object;
they're equal but not identical.

The notion for equality in DOM nodes is actually supported by the 
DOM level 3 working draft:

"""
isSameNode (introduced in DOM Level 3)
    Returns whether this node is the same node as the given one.
    This method provides a way to determine whether two Node references 
    returned by the implementation reference the same object. When two
    Node references are references to the same object, even if through a
    proxy, the references may be used completely interchangeably, such that
    all attributes have the same values and calling the same DOM method on
    either reference always has exactly the same effect.

    Issue isSameNode-1:
        Do we really want to make this different from equals?
    Resolution: Yes, change name from isIdentical to isSameNode.
       (Telcon 4 Jul 2000).
    Issue isSameNode-2:
       Is this really needed if we provide a unique key?
    Resolution: Yes, because the key is only unique within a document.
       (F2F 2 Mar 2001).
    Issue isSameNode-3:
       Definition of 'sameness' is needed.

The issues are especially interesting, as they're dealing with the same
issues we're dealing with now. If I read the text correctly it also implies
they are thinking of a notion of a unique 'key', similar to the current
id(). This is only implied by some comments in the draft though, and not 
actually specified anywhere.

Then again, I just found out they have a compareTreePosition() method added
to the Node interface that we could use for sorting purposes, I think..

> >   If a class does not define a __cmp__() method it should not define a 
> >   __hash__() operation either;
> > 
> > I don't understand why this must be so.
> 
> That is surprising indeed; I had expected that this it is the other
> way 'round: If you implement __cmp__, you also need to implement
> __hash__.

It does say that too. :)

> > The point is simply that id() is not customizable, and hash() is.
> 
> The point is that hash() is tied to cmp() through the dictionary
> implementation, and that conceptionally, hash() maps the large set of
> objects to the much smaller set of numbers, in a unidirectional way
> (i.e. different objects may have equal hash values). Typically, the
> hash of an objects is computed from its state.

But that is in fact what is needed in this case; I have many different
proxy objects which may all map to the same actual DOM node, so they'd
have the same __hash__. But perhaps the other implications of __hash__
break that. What about supplying a 'key' attribute, anticipating DOM 
level 3 vague implications? :)

> > > >   * we need to alter the code in xpath/Util.py so as not to rely on
> > > >     id() anymore but to use the node objects as hash keys directly. 
> > > 
> > > Then doing what? What happens if you have two nodes that have the same
> > > hash value, but which are different?
> > 
> > I reconsidered later; I simply want to use hash() instead of id. In that
> > case you don't have a problem with two objects having the same hash value
> > but are different. If they have the same hash value they're identical.
> 
> Not at all. Consider
> 
> <outer>
>   <inner/>
>   <inner/>
> </outer>
> 
> The two inner elements may reasonably have the same hash value. They
> must not be considered identical, since one is before the other in
> document order.

I don't think it's reasonable to give those inner nodes the same hash
value at all. They're not the same node, and shouldn't hash the same way.
I don't see any reason to make two different nodes hash the same way just
because they have the same name. I'd like to make two node objects hash
the same way if they represent the same node in the underlying XML 
structure. Then again, perhaps we should just move ahead on compareTreePosition
or something..

Regards,

Martijn