numarray: Possible hash collision problem
![](https://secure.gravatar.com/avatar/a539e4b584f7acae019eff0e35a1b836.jpg?s=120&d=mm&r=g)
hash(numarray.arange(1000)) == hash(numarray.arange(10000)) The hash value changes each time I enter the Python interpreter. I have always assumed that hashing was deterministic. Is it?
![](https://secure.gravatar.com/avatar/5c85708f2eed0869671a7d303ca55b85.jpg?s=120&d=mm&r=g)
"Edward C. Jones" <edcjones@comcast.net> writes:
Not suprising: I also get this: hash(object()) == hash(object()) Looking through the source, I think the hash for an array is determined by the object base class, and hence is the id() of the array. The code above can be written long hand as a = numarray.arange(1000) ha = hash(a) # in this case, hash(a) == id(a) del a b = numarray.arange(10000) hb = hash(b) # in this case, hash(b) == id(b) del b ha == hb It's those (implicit) del statements that mean that a and b are stored to the same location in memory, and hence have the same id(): there's no other object created in the interpreter between when a is deleted and b is created. Basically, id() of a object is guaranteed to be unique *amongst all active objects*. It is _not_ guaranteed to be different from objects that have been created and destroyed. This will return false: a = numarray.arange(1000) b = numarray.arange(10000) hash(a) == hash(b) as a and b still both exist. Since arrays are mutable, there's no good way to get a content-based hash. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm@physics.mcmaster.ca
![](https://secure.gravatar.com/avatar/5c85708f2eed0869671a7d303ca55b85.jpg?s=120&d=mm&r=g)
"Edward C. Jones" <edcjones@comcast.net> writes:
Not suprising: I also get this: hash(object()) == hash(object()) Looking through the source, I think the hash for an array is determined by the object base class, and hence is the id() of the array. The code above can be written long hand as a = numarray.arange(1000) ha = hash(a) # in this case, hash(a) == id(a) del a b = numarray.arange(10000) hb = hash(b) # in this case, hash(b) == id(b) del b ha == hb It's those (implicit) del statements that mean that a and b are stored to the same location in memory, and hence have the same id(): there's no other object created in the interpreter between when a is deleted and b is created. Basically, id() of a object is guaranteed to be unique *amongst all active objects*. It is _not_ guaranteed to be different from objects that have been created and destroyed. This will return false: a = numarray.arange(1000) b = numarray.arange(10000) hash(a) == hash(b) as a and b still both exist. Since arrays are mutable, there's no good way to get a content-based hash. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm@physics.mcmaster.ca
participants (2)
-
cookedm@physics.mcmaster.ca
-
Edward C. Jones