key/value store optimized for disk storage
Paul Rubin
no.email at nospam.invalid
Wed May 2 23:29:11 EDT 2012
Steve Howell <showell30 at yahoo.com> writes:
> Thanks. That's definitely in the spirit of what I'm looking for,
> although the non-64 bit version is obviously geared toward a slightly
> smaller data set. My reading of cdb is that it has essentially 64k
> hash buckets, so for 3 million keys, you're still scanning through an
> average of 45 records per read, which is about 90k of data for my
> record size. That seems actually inferior to a btree-based file
> system, unless I'm missing something.
1) presumably you can use more buckets in a 64 bit version; 2) scanning
90k probably still takes far less time than a disk seek, even a "seek"
(several microseconds in practice) with a solid state disk.
> http://thomas.mangin.com/data/source/cdb.py
> Unfortunately, it looks like you have to first build the whole thing
> in memory.
It's probably fixable, but I'd guess you could just use Bernstein's
cdbdump program instead.
Alternatively maybe you could use one of the *dbm libraries,
which burn a little more disk space, but support online update.
More information about the Python-list
mailing list