benjamin.kaplan at case.edu
Mon Sep 6 21:12:53 CEST 2010
On Mon, Sep 6, 2010 at 3:01 PM, Edward Grefenstette <egrefen at gmail.com> wrote:
> Dear Pythonistas,
> For a project I'm working on, I need to store fairly large
> dictionaries (several million keys) in some form (obviously not in
> memory). The obvious course of action was to use a database of some
> The operation is pretty simple, a function is handed a generator that
> gives it keys and values, and it maps the keys to the values in a non-
> relational database (simples!).
> I wrote some code implementing this using anydbm (which used dbhash on
> my system), and it worked fine for about a million entries, but then
> crashed raising a DBPageNotFoundError. I did a little digging around
> and couldn't figure out what was causing this or how to fix it.
> I then quickly swapped anydbm for good ol' fashioned dbm which uses
> gdbm, and it ran even faster a little longer, but after a million
> entries or so it raised the ever-so-unhelpful "gdbm fatal: write
> I then threw caution to the winds and tried simply using cPickle's
> dump in the hope of obtaining some data persistence, but it crashed
> fairly early with a "IOError: [Errno 122] Disk quota exceeded".
> Now the question is: is it something wrong with these dbms? Can they
> not deal with very large sets of data? If not, is there a more optimal
> tool for my needs? Or is the problem unrelated and has something to do
> with my lab computer?
Just as a guess, I'd say that you have a disk quota that you're
hitting with your several million key dbm. You might want to talk to
the lab administrator about raising the quota.
More information about the Python-list