[Python-Dev] One more dict trick

Thu, 31 May 2001 20:24:01 -0400

This is a multi-part message in MIME format.

------=_NextPart_000_0005_01C0EA0F.A145F760
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit

Another version of the patch attached, a bit faster and with a large new
comment block explaining it.  It's looking good!  As I hope the new comments
make clear, nothing about this approach is "a mystery" -- there are
explainable reasons for each fiddly bit.  This gives me more confidence in
it than in the previous approach, and, indeed, it turned out that when I
*thought* "hmm! I bet this change would be a little faster!", it actually
was <wink>.

------=_NextPart_000_0005_01C0EA0F.A145F760
Content-Type: text/plain;
	name="dict.txt"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="dict.txt"

Index: Objects/dictobject.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /cvsroot/python/python/dist/src/Objects/dictobject.c,v
retrieving revision 2.96
diff -c -r2.96 dictobject.c
*** Objects/dictobject.c	2001/05/27 07:39:22	2.96
--- Objects/dictobject.c	2001/06/01 00:17:07
***************
*** 12,123 ****
   */
  #define MINSIZE 8
 =20
! /* define this out if you don't want conversion statistics on exit */
  #undef SHOW_CONVERSION_COUNTS
 =20
  /*
! Table of irreducible polynomials to efficiently cycle through
! GF(2^n)-{0}, 2<=3Dn<=3D30.  A table size is always a power of 2.
! For a table size of 2**i, the polys entry is 2**i + j for some j in 1 =
thru
! 2**i-1 inclusive.  The polys[] entries here happen to add in the =
smallest j
! values "that work".  Work means this:  given any integer k in 1 thru =
2**i-1
! inclusive, a poly works if & only if repeating this code:
! 	print k
! 	k <<=3D 1
! 	if k >=3D 2**i:
! 		k ^=3D poly
! prints every integer in 1 thru 2**i-1 inclusive exactly once before =
printing=20
! k a second time.  Theory can be used to find such polys efficiently, =
but the=20
! operational defn. of "works" is sufficient to find them in reasonable =
time=20
! via brute force program (hint:  any poly that has an even number of 1 =
bits=20
! cannot work; ditto any poly with low bit 0; exploit those).
!=20
! Some major subtleties:  Most hash schemes depend on having a "good" =
hash
! function, in the sense of simulating randomness.  Python doesn't:  =
some of
! its hash functions are trivial, such as hash(i) =3D=3D i for ints i =
(excepting
! i =3D=3D -1, because -1 is the "error occurred" return value from =
tp_hash).
!=20
! This isn't necessarily bad!  To the contrary, that our hash tables are =
powers
! of 2 in size, and that we take the low-order bits as the initial table =
index,
! means that there are no collisions at all for dicts indexed by a =
contiguous
! range of ints.  This is "better than random" behavior, and that's very
! desirable.
!=20
! On the other hand, when collisions occur, the tendency to fill =
contiguous
! slices of the hash table makes a good collision resolution strategy =
crucial;
! e.g., linear probing is right out.
!=20
! Reimer Behrends contributed the idea of using a polynomial-based =
approach,=20
! using repeated multiplication by x in GF(2**n) where a polynomial is =
chosen=20
! such that x is a primitive root.  This visits every table location =
exactly=20
! once, and the sequence of locations probed is highly non-linear.
!=20
! The same is also largely true of quadratic probing for power-of-2 =
tables, of
! the specific
!=20
!     (i + comb(1, 2)) mod size
!     (i + comb(2, 2)) mod size
!     (i + comb(3, 2)) mod size
!     (i + comb(4, 2)) mod size
!     ...
!     (i + comb(j, 2)) mod size
!=20
! flavor.  The polynomial approach "scrambles" the probe indices better, =
but
! more importantly allows to get *some* additional bits of the hash code =
into
! play via computing the initial increment, thus giving a weak form of =
double
! hashing.  Quadratic probing cannot be extended that way (the first =
probe
! offset must be 1, the second 3, the third 6, etc).
!=20
! Christian Tismer later contributed the idea of using polynomial =
division
! instead of multiplication.  The problem is that the multiplicative =
method
! can't get *all* the bits of the hash code into play without expensive
! computations that slow down the initial index and/or initial increment
! computation.  For a set of keys like [i << 16 for i in range(20000)], =
under
! the multiplicative method the initial index and increment were the =
same for
! all keys, so every key followed exactly the same probe sequence, and =
so
! this degenerated into a (very slow) linear search.  The division =
method uses
! all the bits of the hash code naturally in the increment, although it =
*may*
! visit locations more than once until such time as all the high bits of =
the
! increment have been shifted away.  It's also impossible to tell in =
advance
! whether incr is congruent to 0 modulo poly, so each iteration of the =
loop has
! to guard against incr becoming 0.  These are minor costs, as we =
usually don't
! get into the probe loop, and when we do we usually get out on its =
first
! iteration.
  */
 =20
- static long polys[] =3D {
- /*	4 + 3, */	/* first active entry if MINSIZE =3D=3D 4 */
- 	8 + 3,		/* first active entry if MINSIZE =3D=3D 8 */
- 	16 + 3,
- 	32 + 5,
- 	64 + 3,
- 	128 + 3,
- 	256 + 29,
- 	512 + 17,
- 	1024 + 9,
- 	2048 + 5,
- 	4096 + 83,
- 	8192 + 27,
- 	16384 + 43,
- 	32768 + 3,
- 	65536 + 45,
- 	131072 + 9,
- 	262144 + 39,
- 	524288 + 39,
- 	1048576 + 9,
- 	2097152 + 5,
- 	4194304 + 3,
- 	8388608 + 33,
- 	16777216 + 27,
- 	33554432 + 9,
- 	67108864 + 71,
- 	134217728 + 39,
- 	268435456 + 9,
- 	536870912 + 5,
- 	1073741824 + 83
- 	/* 2147483648 + 9 -- if we ever boost this to unsigned long */
- };
-=20
  /* Object used as dummy key to fill deleted entries */
  static PyObject *dummy; /* Initialized by first call to =
newdictobject() */
 =20
--- 12,117 ----
   */
  #define MINSIZE 8
 =20
! /* Define this out if you don't want conversion statistics on exit. */
  #undef SHOW_CONVERSION_COUNTS
 =20
+ /* See large comment block below.  This must be >=3D 1. */
+ #define PERTURB_SHIFT 5
+=20
  /*
! Major subtleties ahead:  Most hash schemes depend on having a "good" =
hash
! function, in the sense of simulating randomness.  Python doesn't:  its =
most
! important hash functions (for strings and ints) are very regular in =
common
! cases:
!=20
! >>> map(hash, (0, 1, 2, 3))
! [0, 1, 2, 3]
! >>> map(hash, ("namea", "nameb", "namec", "named"))
! [-1658398457, -1658398460, -1658398459, -1658398462]
! >>>
!=20
! This isn't necessarily bad!  To the contrary, in a table of size 2**i, =
taking
! the low-order i bits as the initial table index is extremely fast, and =
there
! are no collisions at all for dicts indexed by a contiguous range of =
ints.
! The same is approximately true when keys are "consecutive" strings.  =
So this
! gives better-than-random behavior in common cases, and that's very =
desirable.
!=20
! OTOH, when collisions occur, the tendency to fill contiguous slices of =
the
! hash table makes a good collision resolution strategy crucial.  Taking =
only
! the last i bits of the hash code is also vulnerable:  for example, =
consider
! [i << 16 for i in range(20000)] as a set of keys.  Since ints are =
their own
! hash codes, and this fits in a dict of size 2**15, the last 15 bits of =
every
! hash code are all 0:  they *all* map to the same table index.
!=20
! But catering to unusual cases should not slow the usual ones, so we =
just take
! the last i bits anyway.  It's up to collision resolution to do the =
rest.  If
! we *usually* find the key we're looking for on the first try (and, it =
turns
! out, we usually do -- the table load factor is kept under 2/3, so the =
odds
! are solidly in our favor), then it makes best sense to keep the =
initial index
! computation dirt cheap.
!=20
! The first half of collision resolution is to visit table indices via =
this
! recurrence:
!=20
!     j =3D ((5*j) + 1) mod 2**i
!=20
! For any initial j in range(2**i), repeating that 2**i times generates =
each
! int in range(2**i) exactly once (see any text on random-number =
generation for
! proof).  By itself, this doesn't help much:  like linear probing =
(setting j
! +=3D 1, or j -=3D 1, on each loop trip), it scans the table entries in =
a fixed
! order.  This would be bad, except that's not the only thing we do, and =
it's
! actually *good* in the common cases where hash keys are consecutive.  =
In an
! example that's really too small to make this entirely clear, for a =
table of
! size 2**3 the order of indices is:
!=20
!     0 -> 1 -> 6 -> 7 -> 4 -> 5 -> 2 -> 3 -> 0 [and here it's =
repeating]
!=20
! If two things come in at index 5, the first place we look after is =
index 2,
! not 6, so if another comes in at index 6 the collision at 5 didn't =
hurt it.
! Linear probing is deadly in this case because there the fixed probe =
order
! is the *same* as the order consecutive keys are likely to arrive.  But =
it's
! extremely unlikely hash codes will follow a 5*j+1 recurrence by =
accident,
! and certain that consecutive hash codes do not.
!=20
! The other half of the strategy is to get the other bits of the hash =
code
! into play.  This is done by initializing a (unsigned) vrbl "perturb" =
to the
! full hash code, and changing the recurrence to:
!=20
!     j =3D (5*j) + 1 + perturb;
!     perturb >>=3D PERTURB_SHIFT;
!     use j % 2**i as the next table index;
!=20
! Now the probe sequence depends (eventually) on every bit in the hash =
code,
! and the pseudo-scrambling property of recurring on 5*j+1 us more =
valuable.
! because it quickly magnifies small differences in the bits that didn't =
affect
! the initial index.  Note that because perturb is unsigned, if the =
recurrence
! is executed often enough perturb eventually becomes and remains 0.  At =
that
! point (very rarely reached) the recurrence is on (just) 5*j+1 again, =
and
! that's certain to find an empty slot eventually (since it generates =
every int
! in range(2**i), and we make sure there's always at least one empty =
slot).
!=20
! Selecting a good value for PERTURB_SHIFT is a balancing act.  You want =
it
! small so that the high bits of the hash code continue to affect the =
probe
! sequence across iterations; but you want it large so that in really =
bad cases
! the high-order hash bits have an effect on early iterations.  5 was =
"the
! best" in minimizing  total collisions across experiments Tim Peters =
ran (on
! both normal and pathological cases), but 4 and 6 weren't significantly =
worse.
!=20
! Historical:  Reimer Behrends contributed the idea of using a =
polynomial-based
! approach, using repeated multiplication by x in GF(2**n) where an =
irreducible
! polynomial for each table size was chosen such that x was a primitive =
root.
! Christian Tismer later extended that to use division by x instead, as =
an
! efficient way to get the high bits of the hash code into play.  This =
scheme
! also gave excellent collision statistics, but was more expensive:  two
! if-tests were required inside the loop; computing "the next" index =
took about
! the same number of operations but without as much potential =
parallelism
! (e.g., computing 5*j can go on at the same time as computing 1+perturb =
in the
! above, and then shifting perturb can be done while the table index is =
being
! masked); and the dictobject struct required a member to hold the =
table's
! polynomial.  In Tim's experiments the current scheme ran faster, and =
with
! less code and memory.
  */
 =20
  /* Object used as dummy key to fill deleted entries */
  static PyObject *dummy; /* Initialized by first call to =
newdictobject() */
 =20
***************
*** 168,174 ****
  	int ma_fill;  /* # Active + # Dummy */
  	int ma_used;  /* # Active */
  	int ma_size;  /* total # slots in ma_table */
- 	int ma_poly;  /* appopriate entry from polys vector */
  	/* ma_table points to ma_smalltable for small tables, else to
  	 * additional malloc'ed memory.  ma_table is never NULL!  This rule
  	 * saves repeated runtime null-tests in the workhorse getitem and
--- 162,167 ----
***************
*** 202,209 ****
  	(mp)->ma_table =3D (mp)->ma_smalltable;				\
  	(mp)->ma_size =3D MINSIZE;					\
  	(mp)->ma_used =3D (mp)->ma_fill =3D 0;				\
- 	(mp)->ma_poly =3D polys[0];					\
- 	assert(MINSIZE < (mp)->ma_poly && (mp)->ma_poly < MINSIZE*2);	\
      } while(0)
 =20
  PyObject *
--- 195,200 ----
***************
*** 235,262 ****
  This is based on Algorithm D from Knuth Vol. 3, Sec. 6.4.
  Open addressing is preferred over chaining since the link overhead for
  chaining would be substantial (100% with typical malloc overhead).
- However, instead of going through the table at constant steps, we =
cycle
- through the values of GF(2^n).  This avoids modulo computations, being
- much cheaper on RISC machines, without leading to clustering.
-=20
- The initial probe index is computed as hash mod the table size.
- Subsequent probe indices use the values of x^i in GF(2^n)-{0} as an =
offset,
- where x is a root. The initial offset is derived from hash, too.
 =20
  All arithmetic on hash should ignore overflow.
 =20
! (This version is due to Reimer Behrends, some ideas are also due to
! Jyrki Alakuijala and Vladimir Marangozov.)
 =20
  This function must never return NULL; failures are indicated by =
returning
  a dictentry* for which the me_value field is NULL.  Exceptions are =
never
  reported by this function, and outstanding exceptions are maintained.
  */
  static dictentry *
  lookdict(dictobject *mp, PyObject *key, register long hash)
  {
  	register int i;
! 	register unsigned int incr;
  	register dictentry *freeslot;
  	register unsigned int mask =3D mp->ma_size-1;
  	dictentry *ep0 =3D mp->ma_table;
--- 226,268 ----
  This is based on Algorithm D from Knuth Vol. 3, Sec. 6.4.
  Open addressing is preferred over chaining since the link overhead for
  chaining would be substantial (100% with typical malloc overhead).
 =20
+ The initial probe index is computed as hash mod the table size. =
Subsequent
+ probe indices are computed as explained earlier.
+=20
  All arithmetic on hash should ignore overflow.
 =20
! (The details in this version are due to Tim Peters, building on many =
past
! contributions by Reimer Behrends, Jyrki Alakuijala, Vladimir =
Marangozov and
! Christian Tismer).
 =20
  This function must never return NULL; failures are indicated by =
returning
  a dictentry* for which the me_value field is NULL.  Exceptions are =
never
  reported by this function, and outstanding exceptions are maintained.
  */
+=20
+ /* #define DUMP_HASH_STUFF */
+ #ifdef DUMP_HASH_STUFF
+ static int nEntry =3D 0, nCollide =3D 0, nTrip =3D 0;
+ #define BUMP_ENTRY ++nEntry
+ #define BUMP_COLLIDE ++nCollide
+ #define BUMP_TRIP ++nTrip
+ #define PRINT_HASH_STUFF \
+ 	if ((nEntry & 0x1ff) =3D=3D 0) \
+ 		fprintf(stderr, "%d %d %d\n", nEntry, nCollide, nTrip)
+=20
+ #else
+ #define BUMP_ENTRY
+ #define BUMP_COLLIDE
+ #define BUMP_TRIP
+ #define PRINT_HASH_STUFF
+ #endif
+=20
  static dictentry *
  lookdict(dictobject *mp, PyObject *key, register long hash)
  {
  	register int i;
! 	register unsigned int perturb;
  	register dictentry *freeslot;
  	register unsigned int mask =3D mp->ma_size-1;
  	dictentry *ep0 =3D mp->ma_table;
***************
*** 265,273 ****
  	register int checked_error =3D 0;
  	register int cmp;
  	PyObject *err_type, *err_value, *err_tb;
! 	/* We must come up with (i, incr) such that 0 <=3D i < ma_size
! 	   and 0 < incr < ma_size and both are a function of hash.
! 	   i is the initial table index and incr the initial probe offset. */
  	i =3D hash & mask;
  	ep =3D &ep0[i];
  	if (ep->me_key =3D=3D NULL || ep->me_key =3D=3D key)
--- 271,277 ----
  	register int checked_error =3D 0;
  	register int cmp;
  	PyObject *err_type, *err_value, *err_tb;
! 	BUMP_ENTRY;
  	i =3D hash & mask;
  	ep =3D &ep0[i];
  	if (ep->me_key =3D=3D NULL || ep->me_key =3D=3D key)
***************
*** 294,309 ****
  		}
  		freeslot =3D NULL;
  	}
! 	/* Derive incr from hash, just to make it more arbitrary. Note that
! 	   incr must not be 0, or we will get into an infinite loop.*/
! 	incr =3D hash ^ ((unsigned long)hash >> 3);
!=20
  	/* In the loop, me_key =3D=3D dummy is by far (factor of 100s) the
  	   least likely outcome, so test for that last. */
! 	for (;;) {
! 		if (!incr)
! 			incr =3D 1; /* and incr will never be 0 again */
! 		ep =3D &ep0[(i + incr) & mask];
  		if (ep->me_key =3D=3D NULL) {
  			if (restore_error)
  				PyErr_Restore(err_type, err_value, err_tb);
--- 298,310 ----
  		}
  		freeslot =3D NULL;
  	}
! 	BUMP_COLLIDE;
  	/* In the loop, me_key =3D=3D dummy is by far (factor of 100s) the
  	   least likely outcome, so test for that last. */
! 	for (perturb =3D hash; ; perturb >>=3D PERTURB_SHIFT) {
! 		BUMP_TRIP;
! 		i =3D (i << 2) + i + perturb + 1;
! 		ep =3D &ep0[i & mask];
  		if (ep->me_key =3D=3D NULL) {
  			if (restore_error)
  				PyErr_Restore(err_type, err_value, err_tb);
***************
*** 335,344 ****
  		}
  		else if (ep->me_key =3D=3D dummy && freeslot =3D=3D NULL)
  			freeslot =3D ep;
- 		/* Cycle through GF(2**n). */
- 		if (incr & 1)
- 			incr ^=3D mp->ma_poly; /* clears the lowest bit */
- 		incr >>=3D 1;
  	}
  }
 =20
--- 336,341 ----
***************
*** 356,362 ****
  lookdict_string(dictobject *mp, PyObject *key, register long hash)
  {
  	register int i;
! 	register unsigned int incr;
  	register dictentry *freeslot;
  	register unsigned int mask =3D mp->ma_size-1;
  	dictentry *ep0 =3D mp->ma_table;
--- 353,359 ----
  lookdict_string(dictobject *mp, PyObject *key, register long hash)
  {
  	register int i;
! 	register unsigned int perturb;
  	register dictentry *freeslot;
  	register unsigned int mask =3D mp->ma_size-1;
  	dictentry *ep0 =3D mp->ma_table;
***************
*** 370,377 ****
  		mp->ma_lookup =3D lookdict;
  		return lookdict(mp, key, hash);
  	}
! 	/* We must come up with (i, incr) such that 0 <=3D i < ma_size
! 	   and 0 < incr < ma_size and both are a function of hash */
  	i =3D hash & mask;
  	ep =3D &ep0[i];
  	if (ep->me_key =3D=3D NULL || ep->me_key =3D=3D key)
--- 367,374 ----
  		mp->ma_lookup =3D lookdict;
  		return lookdict(mp, key, hash);
  	}
! 	BUMP_ENTRY;
! 	PRINT_HASH_STUFF;
  	i =3D hash & mask;
  	ep =3D &ep0[i];
  	if (ep->me_key =3D=3D NULL || ep->me_key =3D=3D key)
***************
*** 385,400 ****
  		}
  		freeslot =3D NULL;
  	}
! 	/* Derive incr from hash, just to make it more arbitrary. Note that
! 	   incr must not be 0, or we will get into an infinite loop.*/
! 	incr =3D hash ^ ((unsigned long)hash >> 3);
!=20
  	/* In the loop, me_key =3D=3D dummy is by far (factor of 100s) the
  	   least likely outcome, so test for that last. */
! 	for (;;) {
! 		if (!incr)
! 			incr =3D 1; /* and incr will never be 0 again */
! 		ep =3D &ep0[(i + incr) & mask];
  		if (ep->me_key =3D=3D NULL)
  			return freeslot =3D=3D NULL ? ep : freeslot;
  		if (ep->me_key =3D=3D key
--- 382,394 ----
  		}
  		freeslot =3D NULL;
  	}
! 	BUMP_COLLIDE;
  	/* In the loop, me_key =3D=3D dummy is by far (factor of 100s) the
  	   least likely outcome, so test for that last. */
! 	for (perturb =3D hash; ; perturb >>=3D PERTURB_SHIFT) {
! 		BUMP_TRIP;
! 		i =3D (i << 2) + i + perturb + 1;
! 		ep =3D &ep0[i  & mask];
  		if (ep->me_key =3D=3D NULL)
  			return freeslot =3D=3D NULL ? ep : freeslot;
  		if (ep->me_key =3D=3D key
***************
*** 404,413 ****
  			return ep;
  		if (ep->me_key =3D=3D dummy && freeslot =3D=3D NULL)
  			freeslot =3D ep;
- 		/* Cycle through GF(2**n). */
- 		if (incr & 1)
- 			incr ^=3D mp->ma_poly; /* clears the lowest bit */
- 		incr >>=3D 1;
  	}
  }
 =20
--- 398,403 ----
***************
*** 448,454 ****
  static int
  dictresize(dictobject *mp, int minused)
  {
! 	int newsize, newpoly;
  	dictentry *oldtable, *newtable, *ep;
  	int i;
  	int is_oldtable_malloced;
--- 438,444 ----
  static int
  dictresize(dictobject *mp, int minused)
  {
! 	int newsize;
  	dictentry *oldtable, *newtable, *ep;
  	int i;
  	int is_oldtable_malloced;
***************
*** 456,475 ****
 =20
  	assert(minused >=3D 0);
 =20
! 	/* Find the smallest table size > minused, and its poly[] entry. */
! 	newpoly =3D 0;
! 	newsize =3D MINSIZE;
! 	for (i =3D 0; i < sizeof(polys)/sizeof(polys[0]); ++i) {
! 		if (newsize > minused) {
! 			newpoly =3D polys[i];
! 			break;
! 		}
! 		newsize <<=3D 1;
! 		if (newsize < 0)   /* overflow */
! 			break;
! 	}
! 	if (newpoly =3D=3D 0) {
! 		/* Ran out of polynomials or newsize overflowed. */
  		PyErr_NoMemory();
  		return -1;
  	}
--- 446,457 ----
 =20
  	assert(minused >=3D 0);
 =20
! 	/* Find the smallest table size > minused. */
! 	for (newsize =3D MINSIZE;
! 	     newsize <=3D minused && newsize >=3D 0;
! 	     newsize <<=3D 1)
! 		;
! 	if (newsize < 0) {
  		PyErr_NoMemory();
  		return -1;
  	}
***************
*** 511,517 ****
  	mp->ma_table =3D newtable;
  	mp->ma_size =3D newsize;
  	memset(newtable, 0, sizeof(dictentry) * newsize);
- 	mp->ma_poly =3D newpoly;
  	mp->ma_used =3D 0;
  	i =3D mp->ma_fill;
  	mp->ma_fill =3D 0;
--- 493,498 ----
***************
*** 1255,1261 ****
  	if (a->ma_used !=3D b->ma_used)
  		/* can't be equal if # of entries differ */
  		return 0;
!  =20
  	/* Same # of entries -- check all of 'em.  Exit early on any diff. */
  	for (i =3D 0; i < a->ma_size; i++) {
  		PyObject *aval =3D a->ma_table[i].me_value;
--- 1236,1242 ----
  	if (a->ma_used !=3D b->ma_used)
  		/* can't be equal if # of entries differ */
  		return 0;
!=20
  	/* Same # of entries -- check all of 'em.  Exit early on any diff. */
  	for (i =3D 0; i < a->ma_size; i++) {
  		PyObject *aval =3D a->ma_table[i].me_value;

------=_NextPart_000_0005_01C0EA0F.A145F760--