a question on python dict

Duncan Booth duncan.booth at invalid.invalid
Thu Dec 21 04:06:11 EST 2006


could.net at gmail.com wrote:

> Thank you very much for your explanation!
> 
> I made a mistake that I said the hash value should be recalculated
> each time the dict resize, what I wanted to say was that the position
> of each item should be recalculated.
> 
> Maybe I should take a look at the source code of python dict some day.
> I'll some here for help then.
> 

In your original message you said:

> I think this will waste some time doing the increasing-copying thing.
> If I know I'm going to handle about 20000 items, I can set the size of
> hashtable to 30000.

Have you actually tried timing how much time is wasted doing this? 20000 
items really isn't very many for a dictionary to handle.

I ran a quick test to see how long it takes to create a dictionary with 
20000 items. Remember, as has been explained to you, the items are not 
copied, nor are the hash values recalculated when the dictionary is 
resized, so if your dictionary takes much longer to create than the test 
below gives you (on your machine that is rather than mine), then the 
problem lies elsewhere. e.g. a slow hash function, or the time taken to 
create whatever you are storing in the dictionary.

Just creating the dictionary with no other Python overhead:

timeit.py -s "import random; r = [ random.random() for i in range(20000)]" 
"d = dict.fromkeys(r)"
100 loops, best of 3: 11.3 msec per loop

Creating the dictionary using a Python for loop:

timeit.py -s "import random; r = [ random.random() for i in range(20000)]" 
"d={}" "for k in r: d[k] = None"
100 loops, best of 3: 11.7 msec per loop

Assigning to the elements in a pre-populated (and therefore the right 
size) dictionary:

timeit.py -s "import random; r = [ random.random() for i in range(20000)]; 
d=dict.fromkeys(r)" "for k in r: d[k] = None"
100 loops, best of 3: 10.1 msec per loop

How many times do you create this dictionary that shaving 1.5 milliseconds 
off 11 millseconds matters to you?

FWIW, for a 2,000,000 element dictionary the times are 2.5s and 1.48s 
respectively so the situation does get progressively worse as dictionaries 
get very large.



More information about the Python-list mailing list