Python dictionary size/entry limit?

intelliminer at gmail.com intelliminer at gmail.com
Sun Feb 22 02:08:32 EST 2009


On Feb 21, 6:47 pm, Stefan Behnel <stefan... at behnel.de> wrote:
> intellimi... at gmail.com wrote:
> > I wrote a script to process textual data and extract phrases from
> > them, storing these phrases in a dictionary. It encounters a
> > MemoryError when there are about 11.18M keys in the dictionary, and
> > the size is about 1.5GB.
> > [...]
> > I have 1GB of pysical memory and 3GB in pagefile. Is there a limit to
> > the size or number of entries that a single dictionary can possess? By
> > searching on the web I can't find a clue why this problem occurs.
>
> Python dicts are only limited by what your OS returns as free memory.
> However, when a dict grows, it needs to resize, which means that it has to
> create a bigger copy of itself and redistribute the keys. For a dict that
> is already 1.5GB big, this can temporarily eat a lot more memory than you
> have, at least more than two times as much as the size of the dict itself.
>
> You may be better served with one of the dbm databases that come with
> Python. They live on-disk but do the usual in-memory caching. They'll
> likely perform a lot better than your OS level swap file.
>
> Stefan

Ummm, I didn't know about the dbm databases. It seems there are many
different
modules for this kind of tasks: gdbm, berkeley db, cdb, etc. I'm
needing to implement
a constant hashtable with a large number of keys, but only a small
fraction of them
will be accessed frequently, the read speed is crucial. It would be
ideal if
the implementation caches all the frequently used key/value pairs in
memory. Which
module should I use? And is there a way to specify the amount of
memory it uses for caching?
BTW, the target platform is Linux.

Thank you.



More information about the Python-list mailing list