Storing pairs of (int, int) in a database : which db to choose ?
skip at pobox.com
Tue Dec 23 18:00:12 CET 2003
andrei> So my problem now is what should I use to store my data and
andrei> still be able to retrieve all messages for a given word very
There's the rub. PySQLlite consumes a lot of disk space presumably to
support efficient access to the data (hashing keys, etc). If you want
compact storage you aren't likely to do much better than
Unfortunately, searching that will probably be a bit slow.
If the word ids and msg ids have fixed maximum sizes you might try zero- or
space- padding them so your records are of fixed length. You can then store
the file sorted by wordid and do a binary search in the file looking records
Another possibility, assuming your database keys are integers, is to try the
bsddb recno format database. I've never used it before (not as much call
for a database file with integer keys), but it seems to be *relatively*
space-efficient. I believe it doesn't actually need to store the keys,
since they are implicit.
I built a recno db file containing the pickle of the squares of the numbers
in range(1, 100000):
>>> for n in range(1,100000):
... db1[n] = pickle.dumps(n*n)
The total string length of the pickled squares was 1.3MB. The file size was
More information about the Python-list