key/value store optimized for disk storage
Steve Howell
showell30 at yahoo.com
Wed May 2 22:14:54 EDT 2012
This is slightly off topic, but I'm hoping folks can point me in the
right direction.
I'm looking for a fairly lightweight key/value store that works for
this type of problem:
ideally plays nice with the Python ecosystem
the data set is static, and written infrequently enough that I
definitely want *read* performance to trump all
there is too much data to keep it all in memory (so no memcache)
users will access keys with fairly uniform, random probability
the key/value pairs are fairly homogenous in nature:
keys are <= 16 chars
values are between 1k and 4k bytes generally
approx 3 million key/value pairs
total amount of data == 6Gb
needs to work on relatively recent versions of FreeBSD and Linux
My current solution works like this:
keys are file paths
directories are 2 levels deep (30 dirs w/100k files each)
values are file contents
The current solution isn't horrible, but I'm try to squeeze a little
performance/robustness out of it. A minor nuisance is that I waste a
fair amount of disk space, since the values are generally less than 4k
in size. A larger concern is that I'm not convinced that file systems
are optimized for dealing with lots of little files in a shallow
directory structure.
To deal with the latter issue, a minor refinement would be to deepen
the directory structure, but I want to do due diligence on other
options first.
I'm looking for something a little lighter than a full-on database
(either SQL or no-SQL), although I'm not completely ruling out any
alternatives yet.
As I mention up top, I'm mostly hoping folks can point me toward
sources they trust, whether it be other mailing lists, good tools,
etc. To the extent that this is on topic and folks don't mind
discussing this here, I'm happy to follow up on any questions.
Thanks,
Steve
P.S. I've already found some good information via Google, but there's
a lot of noise out there.
More information about the Python-list
mailing list