[Python-Dev] gc ideas -- sparse memory

Stephen J. Turnbull stephen at xemacs.org
Sat Dec 4 15:22:54 CET 2010


Quotes are out of order.

"Martin v. Löwis" writes:

 > No, it's not. java.lang.Object.hashCode() is equivalent to Python's
 > hash(). In particular, for both of these, the following requirement
 > holds: object that *compare* equal must hash equal.

Aha.  I see, now.  That is indeed a different concept.  Mutable
objects retain their identities across mutations.

 > > Two distinct objects could have the same hash code, and a
 > > further test is needed to distinguish them.
 > 
 > Correct. However, references with different identity hash codes will
 > never refer to the same object.

Why is useful to expose an identity hash?  AFAICS it is *only* useful
in building an identity hash table.  If so, why not just provide id()
or the is operator or both and be done with it?

This is different from exposing the value hash AFAICS, where it may be
possible to optimize value hashing of composite objects by computing
their hashes from the hashes of subobjects, perhaps omitting some
subobjects that don't affect equality (eg, Emacs strings may carry
display properties that are ignored when they are compared as
strings).

With identity, I don't see much point in leaving that up to the
programmer.  In implementations with stable object addresses, the
address is a fine implementation of identity_hash() -- perfect, in
fact.  Otherwise, you either have to omit mutable members from the
computation or store the hashcode in the object.  In the former
situation, you will likely get very coarse hashing, and the set of
immutable members (if any ;-) can be computed by the compiler.  In the
latter, you may as well store an id directly and avoid the whole hash
concept for that type (except as a space optimization, but that again
implies coarse hashing).

 > [F]or the identity hash code, or Python's id function: objects that
 > compare equal may still have different ids, or different java
 > identity hash codes.

ISTM that's no problem, you have to do an extra comparison for
identity when the codes are equal either way.  Of course the identity
hash code is a more precise optimization, but that doesn't make hash()
unusable for this purpose (cf Sun's built-in implementation of
IdentityHashCode that always returns 2).

The problem presumably lies in the fact that there are "id()-ifiable"
objects that, because they are mutable, either are unhashable (Python
hash) or would have an unstable value hash (Lisp sxhash).

Ie, AFAICS

def identity_hash (thing):
    "Wrapper for hash() suitable for building an identity hash table."
    try:
        return hash(thing)
    except TypeError:
        return 0

would be a fine implementation for Python.<wink>

 > > Like hash(), IdentityHashCode doesn't make any promises about identity
 > > at all.
 > 
 > It sure does: calling identityHashCode on the very same object twice
 > will always return the same value - even after a garbage
 > collection.

Well, if you want to put it that way, so does hash().  I read Steven
as placing more emphasis on "like hash()" than on "at all".



More information about the Python-Dev mailing list