[Python-checkins] python/dist/src/Objects setobject.c,1.24,1.25

rhettinger at users.sourceforge.net rhettinger at users.sourceforge.net
Thu Jun 10 18:41:50 EDT 2004


Update of /cvsroot/python/python/dist/src/Objects
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30622

Modified Files:
	setobject.c 
Log Message:
Futher improvements to frozenset hashing (based on Yitz Gale's battery of 
tests which nicely highly highlight weaknesses).

* Initial value is now a large prime.
* Pre-multiply by the set length to add one more basis of differentiation.
* Work a bit harder inside the loop to scatter bits from sources that
  may have closely spaced hash values.

All of this is necessary to make up for keep the hash function commutative.
Fortunately, the hash value is cached so the call to frozenset_hash() will
only occur once per set.



Index: setobject.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/setobject.c,v
retrieving revision 1.24
retrieving revision 1.25
diff -C2 -d -r1.24 -r1.25
*** setobject.c	10 Jun 2004 21:38:41 -0000	1.24
--- setobject.c	10 Jun 2004 22:41:48 -0000	1.25
***************
*** 664,681 ****
  	PyObject *key, *value;
  	int pos = 0;
! 	long hash = 1905176217L;
  
  	if (so->hash != -1)
  		return so->hash;
  
  	while (PyDict_Next(so->data, &pos, &key, &value)) {
! 		/* Multiplying by a large prime increases the bit dispersion for
! 		   closely spaced hash values.  The is important because some
! 		   use cases have many combinations of a small number of 
! 		   elements with nearby hashes so that many distinct combinations
! 		   collapse to only a handful of distinct hash values. */
! 		hash ^= PyObject_Hash(key) * 3644798167u;
  	}
! 	hash *= 69069L;
  	if (hash == -1)
  		hash = 590923713L;
--- 664,683 ----
  	PyObject *key, *value;
  	int pos = 0;
! 	long hash = 1927868237L;
  
  	if (so->hash != -1)
  		return so->hash;
  
+ 	hash *= (PyDict_Size(so->data) + 1);
  	while (PyDict_Next(so->data, &pos, &key, &value)) {
! 		/* Work to increase the bit dispersion for closely spaced hash
! 		   values.  The is important because some use cases have many 
! 		   combinations of a small number of elements with nearby 
! 		   hashes so that many distinct combinations collapse to only 
! 		   a handful of distinct hash values. */
! 		long h = PyObject_Hash(key);
! 		hash ^= (h ^ (h << 16) ^ 89869747L)  * 3644798167u;
  	}
! 	hash = hash * 69069L + 907133923L;
  	if (hash == -1)
  		hash = 590923713L;




More information about the Python-checkins mailing list