[Python-Dev] The Dictionary Gem is polished!
Christian Tismer
tismer@tismer.com
Sun, 17 Dec 2000 19:38:31 +0200
This is a multi-part message in MIME format.
--------------0B643A01C67D836AADED505B
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Old topic: {}.popitem() (was Re: {}.first[key,value,item] ...)
Christian Tismer wrote:
>
> Fredrik Lundh wrote:
> >
> > christian wrote:
> > > That algorithm is really a gem which you should know,
> > > so let me try to explain it.
> >
> > I think someone just won the "brain exploder 2000" award ;-)
<snip>
> As you might have guessed, I didn't do this just for fun.
> It is the old game of explaining what is there, convincing
> everybody that you at least know what you are talking about,
> and then three days later coming up with an improved
> application of the theory.
>
> Today is Monday, 2 days left. :-)
Ok, today is Sunday, I had no time to finish this.
But now it is here.
===========================
===== Claim: =====
===========================
- Dictionary access time can be improved with a minimal change -
On the hash() function:
All Objects are supposed to provide a hash function which is
as good as possible.
Good means to provide a wide range of different keys for different
values.
Problem: There are hash functions which are "good" in this sense,
but they do not spread their randomness uniformly over the
32 bits.
Example: Integers use their own value as hash.
This is ok, as far as the integers are uniformly distributed.
But if they all contain a high power of two, for instance,
the low bits give a very bad hash function.
Take a dictionary with integers range(1000) as keys and access
all entries. Then use a dictionay with the integers shifted
left by 16.
Access time is slowed down by a factor of 100, since every
access is a linear search now.
This is not an urgent problem, although applications exist
where this can play a role (memory addresses for instance
can have high factors of two when people do statistics on
page accesses...)
While this is not a big problem, it is ugly enough to
think of a solution.
Solution 1:
-------------
Try to involve more bits of the hash value by doing extra
shuffling, either
a) in the dictlook function, or
b) in the hash generation itself.
I believe, both can't be really justified for a rare problem.
But how about changing the existing solution in a way that
an improvement is gained without extra cost?
Solution 2: (*the* solution)
----------------------------
Some people may remember what I wrote about re-hashing
functions through the multiplicative group GF(2^n)*,
and I don't want to repeat this here.
The simple idea can be summarized quickly:
The original algorithm uses multiplication by polynomials,
and it is guaranteed that these re-hash values are jittering
through all possible nonzero patterns of the n bits.
Observation: Whe are using an operation of a finite field.
This means that the inverse of multiplication also exists!
Old algortithm (multiplication):
shift the index left by 1
if index > mask:
xor the index with the generator polynomial
New algorithm (division):
if low bit of index set:
xor the index with the generator polynomial
shift the index right by 1
What does this mean? Not so much, we are just cycling through
our bit patterns in reverse order.
But now for the big difference.
First change: We change from multiplication to division.
Second change: We do not mask the hash value before!
The second change is what I was after: By not masking the
hash value when computing the initial index, all the
existing bits in the hash come into play.
This can be seen like a polynomial division, but the initial
remainder (the hash value) was not normalized. After a number
of steps, all the extra bits are wheeled into our index,
but not wasted by masking them off. That gives our re-hash
some more randomness. When all the extra bits are sucked
in, the guaranteed single-visit cycle begins. There cannot
be more than 27 extra cycles in the worst case (dict size
= 32, so there are 27 bits to consume).
I do not expect any bad effect from this modification.
Here some results, dictionaries have 1000 entries:
timing for strings old= 5.097 new= 5.088
timing for bad integers (<<10) old=101.540 new=12.610
timing for bad integers (<<16) old=571.210 new=19.220
On strings, both algorithms behave the same.
On numbers, they differ dramatically.
While the current algorithm is 110 times slower on a worst
case dict (quadratic behavior), the new algorithm accounts
a little for the extra cycle, but is only 4 times slower.
Alternative implementation:
The above approach is conservative in the sense that it
tries not to slow down the current implementation in any
way. An alternative would be to comsume all of the extra
bits at once. But this would add an extra "warmup" loop
like this to the algorithm:
while index > mask:
if low bit of index set:
xor the index with the generator polynomial
shift the index right by 1
This is of course a very good digest of the higher bits,
since it is a polynomial division and not just some
bit xor-ing which might give quite predictable cancellations,
therefore it is "the right way" in my sense.
It might be cheap, but would add over 20 cycles to every
small dict. I therefore don't think it is worth to do this.
Personally, I prefer the solution to merge the bits during
the actual lookup, since it suffices to get access time
from quadratic down to logarithmic.
Attached is a direct translation of the relevant parts
of dictobject.c into Python, with both algorithms
implemented.
cheers - chris
--
Christian Tismer :^) <mailto:tismer@tismer.com>
Mission Impossible 5oftware : Have a break! Take a ride on Python's
Kaunstr. 26 : *Starship* http://starship.python.net
14163 Berlin : PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF
where do you want to jump today? http://www.stackless.com
--------------0B643A01C67D836AADED505B
Content-Type: text/plain; charset=us-ascii;
name="dictest.py"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="dictest.py"
## dictest.py
## Test of a new rehash algorithm
## Chris Tismer
## 2000-12-17
## Mission Impossible 5oftware Team
# The following is a partial re-implementation of
# Python dictionaries in Python.
# The original algorithm was literally turned
# into Python code.
##/*
##Table of irreducible polynomials to efficiently cycle through
##GF(2^n)-{0}, 2<=n<=30.
##*/
polys = [
4 + 3,
8 + 3,
16 + 3,
32 + 5,
64 + 3,
128 + 3,
256 + 29,
512 + 17,
1024 + 9,
2048 + 5,
4096 + 83,
8192 + 27,
16384 + 43,
32768 + 3,
65536 + 45,
131072 + 9,
262144 + 39,
524288 + 39,
1048576 + 9,
2097152 + 5,
4194304 + 3,
8388608 + 33,
16777216 + 27,
33554432 + 9,
67108864 + 71,
134217728 + 39,
268435456 + 9,
536870912 + 5,
1073741824 + 83,
0
]
class NULL: pass
class Dictionary:
dummy = "<dummy key>"
def __init__(mp, newalg=0):
mp.ma_size = 0
mp.ma_poly = 0
mp.ma_table = []
mp.ma_fill = 0
mp.ma_used = 0
mp.oldalg = not newalg
def lookdict(mp, key, _hash):
me_hash, me_key, me_value = range(3) # rec slots
dummy = mp.dummy
mask = mp.ma_size-1
ep0 = mp.ma_table
i = (~_hash) & mask
ep = ep0[i]
if ep[me_key] is NULL or ep[me_key] == key:
return ep
if ep[me_key] == dummy:
freeslot = ep
else:
if (ep[me_hash] == _hash and
cmp(ep[me_key], key) == 0) :
return ep
freeslot = NULL
###### FROM HERE
if mp.oldalg:
incr = (_hash ^ (_hash >> 3)) & mask
else:
# note that we do not mask!
# even the shifting my not be worth it.
incr = _hash ^ (_hash >> 3)
###### TO HERE
if (not incr):
incr = mask
while 1:
ep = ep0[(i+incr)&mask]
if (ep[me_key] is NULL) :
if (freeslot != NULL) :
return freeslot
else:
return ep
if (ep[me_key] == dummy) :
if (freeslot == NULL):
freeslot = ep
elif (ep[me_key] == key or
(ep[me_hash] == _hash and
cmp(ep[me_key], key) == 0)) :
return ep
# Cycle through GF(2^n)-{0}
###### FROM HERE
if mp.oldalg:
incr = incr << 1
if (incr > mask):
incr = incr ^ mp.ma_poly
else:
# new algorithm: do a division
if (incr & 1):
incr = incr ^ mp.ma_poly
incr = incr >> 1
###### TO HERE
def insertdict(mp, key, _hash, value):
me_hash, me_key, me_value = range(3) # rec slots
ep = mp.lookdict(key, _hash)
if (ep[me_value] is not NULL) :
old_value = ep[me_value]
ep[me_value] = value
else :
if (ep[me_key] is NULL):
mp.ma_fill=mp.ma_fill+1
ep[me_key] = key
ep[me_hash] = _hash
ep[me_value] = value
mp.ma_used = mp.ma_used+1
def dictresize(mp, minused):
me_hash, me_key, me_value = range(3) # rec slots
oldsize = mp.ma_size
oldtable = mp.ma_table
MINSIZE = 4
newsize = MINSIZE
for i in range(len(polys)):
if (newsize > minused) :
newpoly = polys[i]
break
newsize = newsize << 1
else:
return -1
_nullentry = range(3)
_nullentry[me_hash] = 0
_nullentry[me_key] = NULL
_nullentry[me_value] = NULL
newtable = map(lambda x,y=_nullentry:y[:], range(newsize))
mp.ma_size = newsize
mp.ma_poly = newpoly
mp.ma_table = newtable
mp.ma_fill = 0
mp.ma_used = 0
for ep in oldtable:
if (ep[me_value] is not NULL):
mp.insertdict(ep[me_key],ep[me_hash],ep[me_value])
return 0
# PyDict_GetItem
def __getitem__(op, key):
me_hash, me_key, me_value = range(3) # rec slots
if not op.ma_table:
raise KeyError, key
_hash = hash(key)
return op.lookdict(key, _hash)[me_value]
# PyDict_SetItem
def __setitem__(op, key, value):
mp = op
_hash = hash(key)
## /* if fill >= 2/3 size, double in size */
if (mp.ma_fill*3 >= mp.ma_size*2) :
if (mp.dictresize(mp.ma_used*2) != 0):
if (mp.ma_fill+1 > mp.ma_size):
raise MemoryError
mp.insertdict(key, _hash, value)
# more interface functions
def keys(self):
me_hash, me_key, me_value = range(3) # rec slots
res = []
for _hash, _key, _value in self.ma_table:
if _value is not NULL:
res.append( _key)
return res
def values(self):
me_hash, me_key, me_value = range(3) # rec slots
res = []
for _hash, _key, _value in self.ma_table:
if _value is not NULL:
res.append( _value)
return res
def items(self):
me_hash, me_key, me_value = range(3) # rec slots
res = []
for _hash, _key, _value in self.ma_table:
if _value is not NULL:
res.append( (_key, _value) )
return res
def __cmp__(self, other):
mine = self.items()
others = other.items()
mine.sort()
others.sort()
return cmp(mine, others)
######################################################
## tests
def timing(func, args=None, n=1, **keywords) :
import time
time=time.time
appl=apply
if args is None: args = ()
if type(args) != type(()) : args=(args,)
rep=range(n)
dummyarg = ("",)
dummykw = {}
dummyfunc = len
if keywords:
before=time()
for i in rep: res=appl(dummyfunc, dummyarg, dummykw)
empty = time()-before
before=time()
for i in rep: res=appl(func, args, keywords)
else:
before=time()
for i in rep: res=appl(dummyfunc, dummyarg)
empty = time()-before
before=time()
for i in rep: res=appl(func, args)
after = time()
return round(after-before-empty,4), res
def test(lis, dic):
for key in lis: dic[key]
def nulltest(lis, dic):
for key in lis: dic
def string_dicts():
d1 = Dictionary() # original
d2 = Dictionary(1) # other rehash
for i in range(1000):
s = str(i) * 5
d1[s] = d2[s] = i
return d1, d2
def badnum_dicts():
d1 = Dictionary() # original
d2 = Dictionary(1) # other rehash
for i in range(1000):
bad = i << 16
d1[bad] = d2[bad] = i
return d1, d2
def do_test(dict, keys, n):
t0 = timing(nulltest, (keys, dict), n)[0]
t1 = timing(test, (keys, dict), n)[0]
return t1-t0
if __name__ == "__main__":
sdold, sdnew = string_dicts()
bdold, bdnew = badnum_dicts()
print "timing for strings old=%.3f new=%.3f" % (
do_test(sdold, sdold.keys(), 100),
do_test(sdnew, sdnew.keys(), 100) )
print "timing for bad integers old=%.3f new=%.3f" % (
do_test(bdold, bdold.keys(), 10) *10,
do_test(bdnew, bdnew.keys(), 10) *10)
"""
D:\crml_doc\platf\py>python dictest.py
timing for strings old=5.097 new=5.088
timing for bad integers old=101.540 new=12.610
"""
--------------0B643A01C67D836AADED505B--