key/value store optimized for disk storage
Steve Howell
showell30 at yahoo.com
Thu May 3 04:42:54 EDT 2012
On May 2, 11:48 pm, Paul Rubin <no.em... at nospam.invalid> wrote:
> Paul Rubin <no.em... at nospam.invalid> writes:
> >looking at the spec more closely, there are 256 hash tables.. ...
>
> You know, there is a much simpler way to do this, if you can afford to
> use a few hundred MB of memory and you don't mind some load time when
> the program first starts. Just dump all the data sequentially into a
> file. Then scan through the file, building up a Python dictionary
> mapping data keys to byte offsets in the file (this is a few hundred MB
> if you have 3M keys). Then dump the dictionary as a Python pickle and
> read it back in when you start the program.
>
> You may want to turn off the cyclic garbage collector when building or
> loading the dictionary, as it badly can slow down the construction of
> big lists and maybe dicts (I'm not sure of the latter).
I'm starting to lean toward the file-offset/seek approach. I am
writing some benchmarks on it, comparing it to a more file-system
based approach like I mentioned in my original post. I'll report back
when I get results, but it's already way past my bedtime for tonight.
Thanks for all your help and suggestions.
More information about the Python-list
mailing list