Oddity with large dictionary (several million entries)
Peter Otten
__peter__ at web.de
Tue Apr 27 14:08:24 EDT 2010
Lothar Werzinger wrote:
> I am trying to load files into a dictionary for analysis. the size of the
> dictionary will grow quite large (several million entries) and as
> inserting into a dictionary is roughly O(n) I figured if I loaded each
> file into it's own dictionary it would speed things up. However it did
> not.
>
> So I decided to write a small test program (attached)
>
> As you can see I am inserting one million entries a time into a map. I ran
> the tests where I put all three million entries into one map and one where
> I put one million each into it's own map.
>
> What I would have expected is that if I insert one million into it's own
> map the time to do that would be roughly constant for each map.
> Interestingly it is not. It's about the same as if I load everything into
> one map.
>
> Oh and I have 4G of RAM and the test consumes about 40% at it's max. I
> even run the test on one of our servers with 64G of RAM, so I can rule out
> swapping as the issue.
>
> Can anyone explain this oddity? Any insight is highly appreciated.
When you are creating objects like there is no tomorrow Python's cyclic
garbage collections often takes a significant amount of time. The first
thing I'd try is therefore switching it off with
import gc
gc.disable()
Peter
More information about the Python-list
mailing list