Integer and String Dictionary Keys for Fast Access

Christopher A. Craig list-python at ccraig.org
Fri Oct 4 10:12:32 EDT 2002


Jeff Epler <jepler at unpythonic.net> wrote:
> so not only is it about the same speed to get a string or int key
> from a dict, it's about the same speed to get an int index from a
> list.  This is independant of the length of the key.

Hashings ints is actually very slightly faster than strings, and
string hashing does take linear time relative to size, but I can't
prove this imperically nor does it matter.  I just ripped out the
string hashing code and ran it 100k times on 10k strings and it took
4.6 seconds.  So since the growth (both time and space) of the string
hash is linear, that means it takes about 4.6e-12 seconds per
character.  I doubt you'll ever notice that cost in practice.  Even
the simplest operations like integer addition end up being nearly as
slow as string hashing of practical values.

Also in response to another post in this thread (sorry, my newsreader
is down so I'm having to respond from the archive), cPython does not
presently cache the hash values, but again this doesn't really matter.

>>> import time
>>> 
>>> def ttest1(d, i):
...   t = time.time()
...   for i in xrange(100000): d[i]
...   print time.time()-t
>>>
>>> def ttest2():
...   t = time.time()
...   for i in xrange(100000): 1+1
...   print time.time()-t
>>>
>>> d = {}
>>> for w in open('/usr/dict/words').readlines():
...   d[w] = None
>>> len(d)
25143
>>>
>>> ttest1(d, 'retrieve')
0.354193091393
>>>
>>> ttest2()
0.298590898514


-- 
Christopher A. Craig <list-python at ccraig.org>
"[Windows NT is] not about 'Where do you want to go today?'; it's more like
'Where am I allowed to go today?'" -- Mike Mitchell, NT Systems Administrator




More information about the Python-list mailing list