Database problems

Edward Grefenstette egrefen at gmail.com
Mon Sep 6 15:01:27 EDT 2010


Dear Pythonistas,

For a project I'm working on, I need to store fairly large
dictionaries (several million keys) in some form (obviously not in
memory). The obvious course of action was to use a database of some
sort.

The operation is pretty simple, a function is handed a generator that
gives it keys and values, and it maps the keys to the values in a non-
relational database (simples!).

I wrote some code implementing this using anydbm (which used dbhash on
my system), and it worked fine for about a million entries, but then
crashed raising a DBPageNotFoundError. I did a little digging around
and couldn't figure out what was causing this or how to fix it.

I then quickly swapped anydbm for good ol' fashioned dbm which uses
gdbm, and it ran even faster a little longer, but after a million
entries or so it raised the ever-so-unhelpful "gdbm fatal: write
error".

I then threw caution to the winds and tried simply using cPickle's
dump in the hope of obtaining some data persistence, but it crashed
fairly early with a "IOError: [Errno 122] Disk quota exceeded".

Now the question is: is it something wrong with these dbms? Can they
not deal with very large sets of data? If not, is there a more optimal
tool for my needs? Or is the problem unrelated and has something to do
with my lab computer?

Best,
Edward



More information about the Python-list mailing list