a question on python dict
Duncan Booth
duncan.booth at invalid.invalid
Thu Dec 21 04:06:11 EST 2006
could.net at gmail.com wrote:
> Thank you very much for your explanation!
>
> I made a mistake that I said the hash value should be recalculated
> each time the dict resize, what I wanted to say was that the position
> of each item should be recalculated.
>
> Maybe I should take a look at the source code of python dict some day.
> I'll some here for help then.
>
In your original message you said:
> I think this will waste some time doing the increasing-copying thing.
> If I know I'm going to handle about 20000 items, I can set the size of
> hashtable to 30000.
Have you actually tried timing how much time is wasted doing this? 20000
items really isn't very many for a dictionary to handle.
I ran a quick test to see how long it takes to create a dictionary with
20000 items. Remember, as has been explained to you, the items are not
copied, nor are the hash values recalculated when the dictionary is
resized, so if your dictionary takes much longer to create than the test
below gives you (on your machine that is rather than mine), then the
problem lies elsewhere. e.g. a slow hash function, or the time taken to
create whatever you are storing in the dictionary.
Just creating the dictionary with no other Python overhead:
timeit.py -s "import random; r = [ random.random() for i in range(20000)]"
"d = dict.fromkeys(r)"
100 loops, best of 3: 11.3 msec per loop
Creating the dictionary using a Python for loop:
timeit.py -s "import random; r = [ random.random() for i in range(20000)]"
"d={}" "for k in r: d[k] = None"
100 loops, best of 3: 11.7 msec per loop
Assigning to the elements in a pre-populated (and therefore the right
size) dictionary:
timeit.py -s "import random; r = [ random.random() for i in range(20000)];
d=dict.fromkeys(r)" "for k in r: d[k] = None"
100 loops, best of 3: 10.1 msec per loop
How many times do you create this dictionary that shaving 1.5 milliseconds
off 11 millseconds matters to you?
FWIW, for a 2,000,000 element dictionary the times are 2.5s and 1.48s
respectively so the situation does get progressively worse as dictionaries
get very large.
More information about the Python-list
mailing list