Sanity check on use of dictionaries

Fredrik Lundh fredrik at pythonware.com
Thu Nov 2 10:56:45 EST 2006


dopey483 at gmail.com wrote:

> I am manipulating lots of log files (about 500,000 files and about 30Gb
> in total) to get them into a little SQL db. Part of this process is
> "normalisation" and creating tables of common data. I am creating
> dictionaries for these in a simple {value,key} form.
> 
> In terms of memory and performance what are the reasonable limits for a
> dictionary with a key and a 16 character string? eg; if I read in one
> of my tables from disk into a dictionary, what sizing is comfortable?
> 100,000 entries? 1,000,000 entries? Lookup times and memory
> requirements are my main worries.

you don't specify what a "key" is, but the following piece of code took 
less than a minute to write, ran in roughly two seconds on my machine, 
and results in a CPython process that uses about 80 megabytes of memory.

 >>> d = {}
 >>> for i in range(1000000):
...     k = str(i).zfill(16)
...     d[k] = k
...
 >>> k
'0000000000999999'

since dictionaries use hash tables, the lookup time is usually 
independent of the dictionary size.  also see:

     http://www.effbot.org/pyfaq/how-are-dictionaries-implemented.htm

</F>




More information about the Python-list mailing list