Oddity with large dictionary (several million entries)

MRAB python at mrabarnett.plus.com
Tue Apr 27 14:09:14 EDT 2010


Lothar Werzinger wrote:
> Hi,
> 
> I am trying to load files into a dictionary for analysis. the size of the 
> dictionary will grow quite large (several million entries) and as inserting 
> into a dictionary is roughly O(n) I figured if I loaded each file into it's 
> own dictionary it would speed things up. However it did not.
> 
> So I decided to write a small test program (attached)
> 
> As you can see I am inserting one million entries a time into a map. I ran 
> the tests where I put all three million entries into one map and one where I 
> put one million each into it's own map.
> 
> What I would have expected is that if I insert one million into it's own map 
> the time to do that would be roughly constant for each map. Interestingly it 
> is not. It's about the same as if I load everything into one map.
> 
> Oh and I have 4G of RAM and the test consumes about 40% at it's max. I even 
> run the test on one of our servers with 64G of RAM, so I can rule out 
> swapping as the issue.
> 
> Can anyone explain this oddity? Any insight is highly appreciated.
> 
> Here's the output of the test runs:
> 
> $ ./dicttest.py -t 0
> Inserting into one map
> Inserting 1000000 keys lasted 0:00:26 (38019 1/s)
> len(map) 1000000
> Inserting 1000000 keys lasted 0:01:17 (12831 1/s)
> len(map) 2000000
> Inserting 1000000 keys lasted 0:02:23 (6972 1/s)
> len(map) 3000000
> total 3000000
> 
> $ ./dicttest.py -t 1
> Inserting into three maps
> Inserting 1000000 keys lasted 0:00:32 (30726 1/s)
> len(map) 1000000
> Inserting 1000000 keys lasted 0:01:29 (11181 1/s)
> len(map) 1000000
> Inserting 1000000 keys lasted 0:02:23 (6957 1/s)
> len(map) 1000000
> total 3000000
> 
[snip]
Inserting into a Python dict is O(1).



More information about the Python-list mailing list