[Python-Dev] One more dict trick
Tim Peters
tim.one@home.com
Thu, 31 May 2001 20:24:01 -0400
This is a multi-part message in MIME format.
------=_NextPart_000_0005_01C0EA0F.A145F760
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Another version of the patch attached, a bit faster and with a large new
comment block explaining it. It's looking good! As I hope the new comments
make clear, nothing about this approach is "a mystery" -- there are
explainable reasons for each fiddly bit. This gives me more confidence in
it than in the previous approach, and, indeed, it turned out that when I
*thought* "hmm! I bet this change would be a little faster!", it actually
was <wink>.
------=_NextPart_000_0005_01C0EA0F.A145F760
Content-Type: text/plain;
name="dict.txt"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
filename="dict.txt"
Index: Objects/dictobject.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /cvsroot/python/python/dist/src/Objects/dictobject.c,v
retrieving revision 2.96
diff -c -r2.96 dictobject.c
*** Objects/dictobject.c 2001/05/27 07:39:22 2.96
--- Objects/dictobject.c 2001/06/01 00:17:07
***************
*** 12,123 ****
*/
#define MINSIZE 8
=20
! /* define this out if you don't want conversion statistics on exit */
#undef SHOW_CONVERSION_COUNTS
=20
/*
! Table of irreducible polynomials to efficiently cycle through
! GF(2^n)-{0}, 2<=3Dn<=3D30. A table size is always a power of 2.
! For a table size of 2**i, the polys entry is 2**i + j for some j in 1 =
thru
! 2**i-1 inclusive. The polys[] entries here happen to add in the =
smallest j
! values "that work". Work means this: given any integer k in 1 thru =
2**i-1
! inclusive, a poly works if & only if repeating this code:
! print k
! k <<=3D 1
! if k >=3D 2**i:
! k ^=3D poly
! prints every integer in 1 thru 2**i-1 inclusive exactly once before =
printing=20
! k a second time. Theory can be used to find such polys efficiently, =
but the=20
! operational defn. of "works" is sufficient to find them in reasonable =
time=20
! via brute force program (hint: any poly that has an even number of 1 =
bits=20
! cannot work; ditto any poly with low bit 0; exploit those).
!=20
! Some major subtleties: Most hash schemes depend on having a "good" =
hash
! function, in the sense of simulating randomness. Python doesn't: =
some of
! its hash functions are trivial, such as hash(i) =3D=3D i for ints i =
(excepting
! i =3D=3D -1, because -1 is the "error occurred" return value from =
tp_hash).
!=20
! This isn't necessarily bad! To the contrary, that our hash tables are =
powers
! of 2 in size, and that we take the low-order bits as the initial table =
index,
! means that there are no collisions at all for dicts indexed by a =
contiguous
! range of ints. This is "better than random" behavior, and that's very
! desirable.
!=20
! On the other hand, when collisions occur, the tendency to fill =
contiguous
! slices of the hash table makes a good collision resolution strategy =
crucial;
! e.g., linear probing is right out.
!=20
! Reimer Behrends contributed the idea of using a polynomial-based =
approach,=20
! using repeated multiplication by x in GF(2**n) where a polynomial is =
chosen=20
! such that x is a primitive root. This visits every table location =
exactly=20
! once, and the sequence of locations probed is highly non-linear.
!=20
! The same is also largely true of quadratic probing for power-of-2 =
tables, of
! the specific
!=20
! (i + comb(1, 2)) mod size
! (i + comb(2, 2)) mod size
! (i + comb(3, 2)) mod size
! (i + comb(4, 2)) mod size
! ...
! (i + comb(j, 2)) mod size
!=20
! flavor. The polynomial approach "scrambles" the probe indices better, =
but
! more importantly allows to get *some* additional bits of the hash code =
into
! play via computing the initial increment, thus giving a weak form of =
double
! hashing. Quadratic probing cannot be extended that way (the first =
probe
! offset must be 1, the second 3, the third 6, etc).
!=20
! Christian Tismer later contributed the idea of using polynomial =
division
! instead of multiplication. The problem is that the multiplicative =
method
! can't get *all* the bits of the hash code into play without expensive
! computations that slow down the initial index and/or initial increment
! computation. For a set of keys like [i << 16 for i in range(20000)], =
under
! the multiplicative method the initial index and increment were the =
same for
! all keys, so every key followed exactly the same probe sequence, and =
so
! this degenerated into a (very slow) linear search. The division =
method uses
! all the bits of the hash code naturally in the increment, although it =
*may*
! visit locations more than once until such time as all the high bits of =
the
! increment have been shifted away. It's also impossible to tell in =
advance
! whether incr is congruent to 0 modulo poly, so each iteration of the =
loop has
! to guard against incr becoming 0. These are minor costs, as we =
usually don't
! get into the probe loop, and when we do we usually get out on its =
first
! iteration.
*/
=20
- static long polys[] =3D {
- /* 4 + 3, */ /* first active entry if MINSIZE =3D=3D 4 */
- 8 + 3, /* first active entry if MINSIZE =3D=3D 8 */
- 16 + 3,
- 32 + 5,
- 64 + 3,
- 128 + 3,
- 256 + 29,
- 512 + 17,
- 1024 + 9,
- 2048 + 5,
- 4096 + 83,
- 8192 + 27,
- 16384 + 43,
- 32768 + 3,
- 65536 + 45,
- 131072 + 9,
- 262144 + 39,
- 524288 + 39,
- 1048576 + 9,
- 2097152 + 5,
- 4194304 + 3,
- 8388608 + 33,
- 16777216 + 27,
- 33554432 + 9,
- 67108864 + 71,
- 134217728 + 39,
- 268435456 + 9,
- 536870912 + 5,
- 1073741824 + 83
- /* 2147483648 + 9 -- if we ever boost this to unsigned long */
- };
-=20
/* Object used as dummy key to fill deleted entries */
static PyObject *dummy; /* Initialized by first call to =
newdictobject() */
=20
--- 12,117 ----
*/
#define MINSIZE 8
=20
! /* Define this out if you don't want conversion statistics on exit. */
#undef SHOW_CONVERSION_COUNTS
=20
+ /* See large comment block below. This must be >=3D 1. */
+ #define PERTURB_SHIFT 5
+=20
/*
! Major subtleties ahead: Most hash schemes depend on having a "good" =
hash
! function, in the sense of simulating randomness. Python doesn't: its =
most
! important hash functions (for strings and ints) are very regular in =
common
! cases:
!=20
! >>> map(hash, (0, 1, 2, 3))
! [0, 1, 2, 3]
! >>> map(hash, ("namea", "nameb", "namec", "named"))
! [-1658398457, -1658398460, -1658398459, -1658398462]
! >>>
!=20
! This isn't necessarily bad! To the contrary, in a table of size 2**i, =
taking
! the low-order i bits as the initial table index is extremely fast, and =
there
! are no collisions at all for dicts indexed by a contiguous range of =
ints.
! The same is approximately true when keys are "consecutive" strings. =
So this
! gives better-than-random behavior in common cases, and that's very =
desirable.
!=20
! OTOH, when collisions occur, the tendency to fill contiguous slices of =
the
! hash table makes a good collision resolution strategy crucial. Taking =
only
! the last i bits of the hash code is also vulnerable: for example, =
consider
! [i << 16 for i in range(20000)] as a set of keys. Since ints are =
their own
! hash codes, and this fits in a dict of size 2**15, the last 15 bits of =
every
! hash code are all 0: they *all* map to the same table index.
!=20
! But catering to unusual cases should not slow the usual ones, so we =
just take
! the last i bits anyway. It's up to collision resolution to do the =
rest. If
! we *usually* find the key we're looking for on the first try (and, it =
turns
! out, we usually do -- the table load factor is kept under 2/3, so the =
odds
! are solidly in our favor), then it makes best sense to keep the =
initial index
! computation dirt cheap.
!=20
! The first half of collision resolution is to visit table indices via =
this
! recurrence:
!=20
! j =3D ((5*j) + 1) mod 2**i
!=20
! For any initial j in range(2**i), repeating that 2**i times generates =
each
! int in range(2**i) exactly once (see any text on random-number =
generation for
! proof). By itself, this doesn't help much: like linear probing =
(setting j
! +=3D 1, or j -=3D 1, on each loop trip), it scans the table entries in =
a fixed
! order. This would be bad, except that's not the only thing we do, and =
it's
! actually *good* in the common cases where hash keys are consecutive. =
In an
! example that's really too small to make this entirely clear, for a =
table of
! size 2**3 the order of indices is:
!=20
! 0 -> 1 -> 6 -> 7 -> 4 -> 5 -> 2 -> 3 -> 0 [and here it's =
repeating]
!=20
! If two things come in at index 5, the first place we look after is =
index 2,
! not 6, so if another comes in at index 6 the collision at 5 didn't =
hurt it.
! Linear probing is deadly in this case because there the fixed probe =
order
! is the *same* as the order consecutive keys are likely to arrive. But =
it's
! extremely unlikely hash codes will follow a 5*j+1 recurrence by =
accident,
! and certain that consecutive hash codes do not.
!=20
! The other half of the strategy is to get the other bits of the hash =
code
! into play. This is done by initializing a (unsigned) vrbl "perturb" =
to the
! full hash code, and changing the recurrence to:
!=20
! j =3D (5*j) + 1 + perturb;
! perturb >>=3D PERTURB_SHIFT;
! use j % 2**i as the next table index;
!=20
! Now the probe sequence depends (eventually) on every bit in the hash =
code,
! and the pseudo-scrambling property of recurring on 5*j+1 us more =
valuable.
! because it quickly magnifies small differences in the bits that didn't =
affect
! the initial index. Note that because perturb is unsigned, if the =
recurrence
! is executed often enough perturb eventually becomes and remains 0. At =
that
! point (very rarely reached) the recurrence is on (just) 5*j+1 again, =
and
! that's certain to find an empty slot eventually (since it generates =
every int
! in range(2**i), and we make sure there's always at least one empty =
slot).
!=20
! Selecting a good value for PERTURB_SHIFT is a balancing act. You want =
it
! small so that the high bits of the hash code continue to affect the =
probe
! sequence across iterations; but you want it large so that in really =
bad cases
! the high-order hash bits have an effect on early iterations. 5 was =
"the
! best" in minimizing total collisions across experiments Tim Peters =
ran (on
! both normal and pathological cases), but 4 and 6 weren't significantly =
worse.
!=20
! Historical: Reimer Behrends contributed the idea of using a =
polynomial-based
! approach, using repeated multiplication by x in GF(2**n) where an =
irreducible
! polynomial for each table size was chosen such that x was a primitive =
root.
! Christian Tismer later extended that to use division by x instead, as =
an
! efficient way to get the high bits of the hash code into play. This =
scheme
! also gave excellent collision statistics, but was more expensive: two
! if-tests were required inside the loop; computing "the next" index =
took about
! the same number of operations but without as much potential =
parallelism
! (e.g., computing 5*j can go on at the same time as computing 1+perturb =
in the
! above, and then shifting perturb can be done while the table index is =
being
! masked); and the dictobject struct required a member to hold the =
table's
! polynomial. In Tim's experiments the current scheme ran faster, and =
with
! less code and memory.
*/
=20
/* Object used as dummy key to fill deleted entries */
static PyObject *dummy; /* Initialized by first call to =
newdictobject() */
=20
***************
*** 168,174 ****
int ma_fill; /* # Active + # Dummy */
int ma_used; /* # Active */
int ma_size; /* total # slots in ma_table */
- int ma_poly; /* appopriate entry from polys vector */
/* ma_table points to ma_smalltable for small tables, else to
* additional malloc'ed memory. ma_table is never NULL! This rule
* saves repeated runtime null-tests in the workhorse getitem and
--- 162,167 ----
***************
*** 202,209 ****
(mp)->ma_table =3D (mp)->ma_smalltable; \
(mp)->ma_size =3D MINSIZE; \
(mp)->ma_used =3D (mp)->ma_fill =3D 0; \
- (mp)->ma_poly =3D polys[0]; \
- assert(MINSIZE < (mp)->ma_poly && (mp)->ma_poly < MINSIZE*2); \
} while(0)
=20
PyObject *
--- 195,200 ----
***************
*** 235,262 ****
This is based on Algorithm D from Knuth Vol. 3, Sec. 6.4.
Open addressing is preferred over chaining since the link overhead for
chaining would be substantial (100% with typical malloc overhead).
- However, instead of going through the table at constant steps, we =
cycle
- through the values of GF(2^n). This avoids modulo computations, being
- much cheaper on RISC machines, without leading to clustering.
-=20
- The initial probe index is computed as hash mod the table size.
- Subsequent probe indices use the values of x^i in GF(2^n)-{0} as an =
offset,
- where x is a root. The initial offset is derived from hash, too.
=20
All arithmetic on hash should ignore overflow.
=20
! (This version is due to Reimer Behrends, some ideas are also due to
! Jyrki Alakuijala and Vladimir Marangozov.)
=20
This function must never return NULL; failures are indicated by =
returning
a dictentry* for which the me_value field is NULL. Exceptions are =
never
reported by this function, and outstanding exceptions are maintained.
*/
static dictentry *
lookdict(dictobject *mp, PyObject *key, register long hash)
{
register int i;
! register unsigned int incr;
register dictentry *freeslot;
register unsigned int mask =3D mp->ma_size-1;
dictentry *ep0 =3D mp->ma_table;
--- 226,268 ----
This is based on Algorithm D from Knuth Vol. 3, Sec. 6.4.
Open addressing is preferred over chaining since the link overhead for
chaining would be substantial (100% with typical malloc overhead).
=20
+ The initial probe index is computed as hash mod the table size. =
Subsequent
+ probe indices are computed as explained earlier.
+=20
All arithmetic on hash should ignore overflow.
=20
! (The details in this version are due to Tim Peters, building on many =
past
! contributions by Reimer Behrends, Jyrki Alakuijala, Vladimir =
Marangozov and
! Christian Tismer).
=20
This function must never return NULL; failures are indicated by =
returning
a dictentry* for which the me_value field is NULL. Exceptions are =
never
reported by this function, and outstanding exceptions are maintained.
*/
+=20
+ /* #define DUMP_HASH_STUFF */
+ #ifdef DUMP_HASH_STUFF
+ static int nEntry =3D 0, nCollide =3D 0, nTrip =3D 0;
+ #define BUMP_ENTRY ++nEntry
+ #define BUMP_COLLIDE ++nCollide
+ #define BUMP_TRIP ++nTrip
+ #define PRINT_HASH_STUFF \
+ if ((nEntry & 0x1ff) =3D=3D 0) \
+ fprintf(stderr, "%d %d %d\n", nEntry, nCollide, nTrip)
+=20
+ #else
+ #define BUMP_ENTRY
+ #define BUMP_COLLIDE
+ #define BUMP_TRIP
+ #define PRINT_HASH_STUFF
+ #endif
+=20
static dictentry *
lookdict(dictobject *mp, PyObject *key, register long hash)
{
register int i;
! register unsigned int perturb;
register dictentry *freeslot;
register unsigned int mask =3D mp->ma_size-1;
dictentry *ep0 =3D mp->ma_table;
***************
*** 265,273 ****
register int checked_error =3D 0;
register int cmp;
PyObject *err_type, *err_value, *err_tb;
! /* We must come up with (i, incr) such that 0 <=3D i < ma_size
! and 0 < incr < ma_size and both are a function of hash.
! i is the initial table index and incr the initial probe offset. */
i =3D hash & mask;
ep =3D &ep0[i];
if (ep->me_key =3D=3D NULL || ep->me_key =3D=3D key)
--- 271,277 ----
register int checked_error =3D 0;
register int cmp;
PyObject *err_type, *err_value, *err_tb;
! BUMP_ENTRY;
i =3D hash & mask;
ep =3D &ep0[i];
if (ep->me_key =3D=3D NULL || ep->me_key =3D=3D key)
***************
*** 294,309 ****
}
freeslot =3D NULL;
}
! /* Derive incr from hash, just to make it more arbitrary. Note that
! incr must not be 0, or we will get into an infinite loop.*/
! incr =3D hash ^ ((unsigned long)hash >> 3);
!=20
/* In the loop, me_key =3D=3D dummy is by far (factor of 100s) the
least likely outcome, so test for that last. */
! for (;;) {
! if (!incr)
! incr =3D 1; /* and incr will never be 0 again */
! ep =3D &ep0[(i + incr) & mask];
if (ep->me_key =3D=3D NULL) {
if (restore_error)
PyErr_Restore(err_type, err_value, err_tb);
--- 298,310 ----
}
freeslot =3D NULL;
}
! BUMP_COLLIDE;
/* In the loop, me_key =3D=3D dummy is by far (factor of 100s) the
least likely outcome, so test for that last. */
! for (perturb =3D hash; ; perturb >>=3D PERTURB_SHIFT) {
! BUMP_TRIP;
! i =3D (i << 2) + i + perturb + 1;
! ep =3D &ep0[i & mask];
if (ep->me_key =3D=3D NULL) {
if (restore_error)
PyErr_Restore(err_type, err_value, err_tb);
***************
*** 335,344 ****
}
else if (ep->me_key =3D=3D dummy && freeslot =3D=3D NULL)
freeslot =3D ep;
- /* Cycle through GF(2**n). */
- if (incr & 1)
- incr ^=3D mp->ma_poly; /* clears the lowest bit */
- incr >>=3D 1;
}
}
=20
--- 336,341 ----
***************
*** 356,362 ****
lookdict_string(dictobject *mp, PyObject *key, register long hash)
{
register int i;
! register unsigned int incr;
register dictentry *freeslot;
register unsigned int mask =3D mp->ma_size-1;
dictentry *ep0 =3D mp->ma_table;
--- 353,359 ----
lookdict_string(dictobject *mp, PyObject *key, register long hash)
{
register int i;
! register unsigned int perturb;
register dictentry *freeslot;
register unsigned int mask =3D mp->ma_size-1;
dictentry *ep0 =3D mp->ma_table;
***************
*** 370,377 ****
mp->ma_lookup =3D lookdict;
return lookdict(mp, key, hash);
}
! /* We must come up with (i, incr) such that 0 <=3D i < ma_size
! and 0 < incr < ma_size and both are a function of hash */
i =3D hash & mask;
ep =3D &ep0[i];
if (ep->me_key =3D=3D NULL || ep->me_key =3D=3D key)
--- 367,374 ----
mp->ma_lookup =3D lookdict;
return lookdict(mp, key, hash);
}
! BUMP_ENTRY;
! PRINT_HASH_STUFF;
i =3D hash & mask;
ep =3D &ep0[i];
if (ep->me_key =3D=3D NULL || ep->me_key =3D=3D key)
***************
*** 385,400 ****
}
freeslot =3D NULL;
}
! /* Derive incr from hash, just to make it more arbitrary. Note that
! incr must not be 0, or we will get into an infinite loop.*/
! incr =3D hash ^ ((unsigned long)hash >> 3);
!=20
/* In the loop, me_key =3D=3D dummy is by far (factor of 100s) the
least likely outcome, so test for that last. */
! for (;;) {
! if (!incr)
! incr =3D 1; /* and incr will never be 0 again */
! ep =3D &ep0[(i + incr) & mask];
if (ep->me_key =3D=3D NULL)
return freeslot =3D=3D NULL ? ep : freeslot;
if (ep->me_key =3D=3D key
--- 382,394 ----
}
freeslot =3D NULL;
}
! BUMP_COLLIDE;
/* In the loop, me_key =3D=3D dummy is by far (factor of 100s) the
least likely outcome, so test for that last. */
! for (perturb =3D hash; ; perturb >>=3D PERTURB_SHIFT) {
! BUMP_TRIP;
! i =3D (i << 2) + i + perturb + 1;
! ep =3D &ep0[i & mask];
if (ep->me_key =3D=3D NULL)
return freeslot =3D=3D NULL ? ep : freeslot;
if (ep->me_key =3D=3D key
***************
*** 404,413 ****
return ep;
if (ep->me_key =3D=3D dummy && freeslot =3D=3D NULL)
freeslot =3D ep;
- /* Cycle through GF(2**n). */
- if (incr & 1)
- incr ^=3D mp->ma_poly; /* clears the lowest bit */
- incr >>=3D 1;
}
}
=20
--- 398,403 ----
***************
*** 448,454 ****
static int
dictresize(dictobject *mp, int minused)
{
! int newsize, newpoly;
dictentry *oldtable, *newtable, *ep;
int i;
int is_oldtable_malloced;
--- 438,444 ----
static int
dictresize(dictobject *mp, int minused)
{
! int newsize;
dictentry *oldtable, *newtable, *ep;
int i;
int is_oldtable_malloced;
***************
*** 456,475 ****
=20
assert(minused >=3D 0);
=20
! /* Find the smallest table size > minused, and its poly[] entry. */
! newpoly =3D 0;
! newsize =3D MINSIZE;
! for (i =3D 0; i < sizeof(polys)/sizeof(polys[0]); ++i) {
! if (newsize > minused) {
! newpoly =3D polys[i];
! break;
! }
! newsize <<=3D 1;
! if (newsize < 0) /* overflow */
! break;
! }
! if (newpoly =3D=3D 0) {
! /* Ran out of polynomials or newsize overflowed. */
PyErr_NoMemory();
return -1;
}
--- 446,457 ----
=20
assert(minused >=3D 0);
=20
! /* Find the smallest table size > minused. */
! for (newsize =3D MINSIZE;
! newsize <=3D minused && newsize >=3D 0;
! newsize <<=3D 1)
! ;
! if (newsize < 0) {
PyErr_NoMemory();
return -1;
}
***************
*** 511,517 ****
mp->ma_table =3D newtable;
mp->ma_size =3D newsize;
memset(newtable, 0, sizeof(dictentry) * newsize);
- mp->ma_poly =3D newpoly;
mp->ma_used =3D 0;
i =3D mp->ma_fill;
mp->ma_fill =3D 0;
--- 493,498 ----
***************
*** 1255,1261 ****
if (a->ma_used !=3D b->ma_used)
/* can't be equal if # of entries differ */
return 0;
! =20
/* Same # of entries -- check all of 'em. Exit early on any diff. */
for (i =3D 0; i < a->ma_size; i++) {
PyObject *aval =3D a->ma_table[i].me_value;
--- 1236,1242 ----
if (a->ma_used !=3D b->ma_used)
/* can't be equal if # of entries differ */
return 0;
!=20
/* Same # of entries -- check all of 'em. Exit early on any diff. */
for (i =3D 0; i < a->ma_size; i++) {
PyObject *aval =3D a->ma_table[i].me_value;
------=_NextPart_000_0005_01C0EA0F.A145F760--