
Old topic: {}.popitem() (was Re: {}.first[key,value,item] ...) Christian Tismer wrote:
Fredrik Lundh wrote:
christian wrote:
That algorithm is really a gem which you should know, so let me try to explain it.
I think someone just won the "brain exploder 2000" award ;-)
<snip>
As you might have guessed, I didn't do this just for fun. It is the old game of explaining what is there, convincing everybody that you at least know what you are talking about, and then three days later coming up with an improved application of the theory.
Today is Monday, 2 days left. :-)
Ok, today is Sunday, I had no time to finish this. But now it is here. =========================== ===== Claim: ===== =========================== - Dictionary access time can be improved with a minimal change - On the hash() function: All Objects are supposed to provide a hash function which is as good as possible. Good means to provide a wide range of different keys for different values. Problem: There are hash functions which are "good" in this sense, but they do not spread their randomness uniformly over the 32 bits. Example: Integers use their own value as hash. This is ok, as far as the integers are uniformly distributed. But if they all contain a high power of two, for instance, the low bits give a very bad hash function. Take a dictionary with integers range(1000) as keys and access all entries. Then use a dictionay with the integers shifted left by 16. Access time is slowed down by a factor of 100, since every access is a linear search now. This is not an urgent problem, although applications exist where this can play a role (memory addresses for instance can have high factors of two when people do statistics on page accesses...) While this is not a big problem, it is ugly enough to think of a solution. Solution 1: ------------- Try to involve more bits of the hash value by doing extra shuffling, either a) in the dictlook function, or b) in the hash generation itself. I believe, both can't be really justified for a rare problem. But how about changing the existing solution in a way that an improvement is gained without extra cost? Solution 2: (*the* solution) ---------------------------- Some people may remember what I wrote about re-hashing functions through the multiplicative group GF(2^n)*, and I don't want to repeat this here. The simple idea can be summarized quickly: The original algorithm uses multiplication by polynomials, and it is guaranteed that these re-hash values are jittering through all possible nonzero patterns of the n bits. Observation: Whe are using an operation of a finite field. This means that the inverse of multiplication also exists! Old algortithm (multiplication): shift the index left by 1 if index > mask: xor the index with the generator polynomial New algorithm (division): if low bit of index set: xor the index with the generator polynomial shift the index right by 1 What does this mean? Not so much, we are just cycling through our bit patterns in reverse order. But now for the big difference. First change: We change from multiplication to division. Second change: We do not mask the hash value before! The second change is what I was after: By not masking the hash value when computing the initial index, all the existing bits in the hash come into play. This can be seen like a polynomial division, but the initial remainder (the hash value) was not normalized. After a number of steps, all the extra bits are wheeled into our index, but not wasted by masking them off. That gives our re-hash some more randomness. When all the extra bits are sucked in, the guaranteed single-visit cycle begins. There cannot be more than 27 extra cycles in the worst case (dict size = 32, so there are 27 bits to consume). I do not expect any bad effect from this modification. Here some results, dictionaries have 1000 entries: timing for strings old= 5.097 new= 5.088 timing for bad integers (<<10) old=101.540 new=12.610 timing for bad integers (<<16) old=571.210 new=19.220 On strings, both algorithms behave the same. On numbers, they differ dramatically. While the current algorithm is 110 times slower on a worst case dict (quadratic behavior), the new algorithm accounts a little for the extra cycle, but is only 4 times slower. Alternative implementation: The above approach is conservative in the sense that it tries not to slow down the current implementation in any way. An alternative would be to comsume all of the extra bits at once. But this would add an extra "warmup" loop like this to the algorithm: while index > mask: if low bit of index set: xor the index with the generator polynomial shift the index right by 1 This is of course a very good digest of the higher bits, since it is a polynomial division and not just some bit xor-ing which might give quite predictable cancellations, therefore it is "the right way" in my sense. It might be cheap, but would add over 20 cycles to every small dict. I therefore don't think it is worth to do this. Personally, I prefer the solution to merge the bits during the actual lookup, since it suffices to get access time from quadratic down to logarithmic. Attached is a direct translation of the relevant parts of dictobject.c into Python, with both algorithms implemented. cheers - chris -- Christian Tismer :^) <mailto:tismer@tismer.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com ## dictest.py ## Test of a new rehash algorithm ## Chris Tismer ## 2000-12-17 ## Mission Impossible 5oftware Team # The following is a partial re-implementation of # Python dictionaries in Python. # The original algorithm was literally turned # into Python code. ##/* ##Table of irreducible polynomials to efficiently cycle through ##GF(2^n)-{0}, 2<=n<=30. ##*/ polys = [ 4 + 3, 8 + 3, 16 + 3, 32 + 5, 64 + 3, 128 + 3, 256 + 29, 512 + 17, 1024 + 9, 2048 + 5, 4096 + 83, 8192 + 27, 16384 + 43, 32768 + 3, 65536 + 45, 131072 + 9, 262144 + 39, 524288 + 39, 1048576 + 9, 2097152 + 5, 4194304 + 3, 8388608 + 33, 16777216 + 27, 33554432 + 9, 67108864 + 71, 134217728 + 39, 268435456 + 9, 536870912 + 5, 1073741824 + 83, 0 ] class NULL: pass class Dictionary: dummy = "<dummy key>" def __init__(mp, newalg=0): mp.ma_size = 0 mp.ma_poly = 0 mp.ma_table = [] mp.ma_fill = 0 mp.ma_used = 0 mp.oldalg = not newalg def lookdict(mp, key, _hash): me_hash, me_key, me_value = range(3) # rec slots dummy = mp.dummy mask = mp.ma_size-1 ep0 = mp.ma_table i = (~_hash) & mask ep = ep0[i] if ep[me_key] is NULL or ep[me_key] == key: return ep if ep[me_key] == dummy: freeslot = ep else: if (ep[me_hash] == _hash and cmp(ep[me_key], key) == 0) : return ep freeslot = NULL ###### FROM HERE if mp.oldalg: incr = (_hash ^ (_hash >> 3)) & mask else: # note that we do not mask! # even the shifting my not be worth it. incr = _hash ^ (_hash >> 3) ###### TO HERE if (not incr): incr = mask while 1: ep = ep0[(i+incr)&mask] if (ep[me_key] is NULL) : if (freeslot != NULL) : return freeslot else: return ep if (ep[me_key] == dummy) : if (freeslot == NULL): freeslot = ep elif (ep[me_key] == key or (ep[me_hash] == _hash and cmp(ep[me_key], key) == 0)) : return ep # Cycle through GF(2^n)-{0} ###### FROM HERE if mp.oldalg: incr = incr << 1 if (incr > mask): incr = incr ^ mp.ma_poly else: # new algorithm: do a division if (incr & 1): incr = incr ^ mp.ma_poly incr = incr >> 1 ###### TO HERE def insertdict(mp, key, _hash, value): me_hash, me_key, me_value = range(3) # rec slots ep = mp.lookdict(key, _hash) if (ep[me_value] is not NULL) : old_value = ep[me_value] ep[me_value] = value else : if (ep[me_key] is NULL): mp.ma_fill=mp.ma_fill+1 ep[me_key] = key ep[me_hash] = _hash ep[me_value] = value mp.ma_used = mp.ma_used+1 def dictresize(mp, minused): me_hash, me_key, me_value = range(3) # rec slots oldsize = mp.ma_size oldtable = mp.ma_table MINSIZE = 4 newsize = MINSIZE for i in range(len(polys)): if (newsize > minused) : newpoly = polys[i] break newsize = newsize << 1 else: return -1 _nullentry = range(3) _nullentry[me_hash] = 0 _nullentry[me_key] = NULL _nullentry[me_value] = NULL newtable = map(lambda x,y=_nullentry:y[:], range(newsize)) mp.ma_size = newsize mp.ma_poly = newpoly mp.ma_table = newtable mp.ma_fill = 0 mp.ma_used = 0 for ep in oldtable: if (ep[me_value] is not NULL): mp.insertdict(ep[me_key],ep[me_hash],ep[me_value]) return 0 # PyDict_GetItem def __getitem__(op, key): me_hash, me_key, me_value = range(3) # rec slots if not op.ma_table: raise KeyError, key _hash = hash(key) return op.lookdict(key, _hash)[me_value] # PyDict_SetItem def __setitem__(op, key, value): mp = op _hash = hash(key) ## /* if fill >= 2/3 size, double in size */ if (mp.ma_fill*3 >= mp.ma_size*2) : if (mp.dictresize(mp.ma_used*2) != 0): if (mp.ma_fill+1 > mp.ma_size): raise MemoryError mp.insertdict(key, _hash, value) # more interface functions def keys(self): me_hash, me_key, me_value = range(3) # rec slots res = [] for _hash, _key, _value in self.ma_table: if _value is not NULL: res.append( _key) return res def values(self): me_hash, me_key, me_value = range(3) # rec slots res = [] for _hash, _key, _value in self.ma_table: if _value is not NULL: res.append( _value) return res def items(self): me_hash, me_key, me_value = range(3) # rec slots res = [] for _hash, _key, _value in self.ma_table: if _value is not NULL: res.append( (_key, _value) ) return res def __cmp__(self, other): mine = self.items() others = other.items() mine.sort() others.sort() return cmp(mine, others) ###################################################### ## tests def timing(func, args=None, n=1, **keywords) : import time time=time.time appl=apply if args is None: args = () if type(args) != type(()) : args=(args,) rep=range(n) dummyarg = ("",) dummykw = {} dummyfunc = len if keywords: before=time() for i in rep: res=appl(dummyfunc, dummyarg, dummykw) empty = time()-before before=time() for i in rep: res=appl(func, args, keywords) else: before=time() for i in rep: res=appl(dummyfunc, dummyarg) empty = time()-before before=time() for i in rep: res=appl(func, args) after = time() return round(after-before-empty,4), res def test(lis, dic): for key in lis: dic[key] def nulltest(lis, dic): for key in lis: dic def string_dicts(): d1 = Dictionary() # original d2 = Dictionary(1) # other rehash for i in range(1000): s = str(i) * 5 d1[s] = d2[s] = i return d1, d2 def badnum_dicts(): d1 = Dictionary() # original d2 = Dictionary(1) # other rehash for i in range(1000): bad = i << 16 d1[bad] = d2[bad] = i return d1, d2 def do_test(dict, keys, n): t0 = timing(nulltest, (keys, dict), n)[0] t1 = timing(test, (keys, dict), n)[0] return t1-t0 if __name__ == "__main__": sdold, sdnew = string_dicts() bdold, bdnew = badnum_dicts() print "timing for strings old=%.3f new=%.3f" % ( do_test(sdold, sdold.keys(), 100), do_test(sdnew, sdnew.keys(), 100) ) print "timing for bad integers old=%.3f new=%.3f" % ( do_test(bdold, bdold.keys(), 10) *10, do_test(bdnew, bdnew.keys(), 10) *10) """ D:\crml_doc\platf\py>python dictest.py timing for strings old=5.097 new=5.088 timing for bad integers old=101.540 new=12.610 """