key/value store optimized for disk storage
Tim Chase
python.list at tim.thechases.com
Fri May 4 13:46:43 EDT 2012
On 05/04/12 12:22, Steve Howell wrote:
> Which variant do you recommend?
>
> """ anydbm is a generic interface to variants of the DBM database
> — dbhash (requires bsddb), gdbm, or dbm. If none of these modules
> is installed, the slow-but-simple implementation in module
> dumbdbm will be used.
>
> """
If you use the stock anydbm module, it automatically chooses the
best it knows from the ones available:
import os
import hashlib
import random
from string import letters
import anydbm
KB = 1024
MB = KB * KB
GB = MB * KB
DESIRED_SIZE = 1 * GB
KEYS_TO_SAMPLE = 20
FNAME = "mydata.db"
i = 0
md5 = hashlib.md5()
db = anydbm.open(FNAME, 'c')
try:
print("Generating junk data...")
while os.path.getsize(FNAME) < 6*GB:
key = md5.update(str(i))[:16]
size = random.randrange(1*KB, 4*KB)
value = ''.join(random.choice(letters)
for _ in range(size))
db[key] = value
i += 1
print("Gathering %i sample keys" % KEYS_TO_SAMPLE)
keys_of_interest = random.sample(db.keys(), KEYS_TO_SAMPLE)
finally:
db.close()
print("Reopening for a cold sample set in case it matters")
db = anydbm.open(FNAME)
try:
print("Performing %i lookups")
for key in keys_of_interest:
v = db[key]
print("Done")
finally:
db.close()
(your specs said ~6gb of data, keys up to 16 characters, values of
1k-4k, so this should generate such data)
-tkc
More information about the Python-list
mailing list