key/value store optimized for disk storage

Steve Howell showell30 at yahoo.com
Thu May 3 04:42:54 EDT 2012


On May 2, 11:48 pm, Paul Rubin <no.em... at nospam.invalid> wrote:
> Paul Rubin <no.em... at nospam.invalid> writes:
> >looking at the spec more closely, there are 256 hash tables.. ...
>
> You know, there is a much simpler way to do this, if you can afford to
> use a few hundred MB of memory and you don't mind some load time when
> the program first starts.  Just dump all the data sequentially into a
> file.  Then scan through the file, building up a Python dictionary
> mapping data keys to byte offsets in the file (this is a few hundred MB
> if you have 3M keys).  Then dump the dictionary as a Python pickle and
> read it back in when you start the program.
>
> You may want to turn off the cyclic garbage collector when building or
> loading the dictionary, as it badly can slow down the construction of
> big lists and maybe dicts (I'm not sure of the latter).

I'm starting to lean toward the file-offset/seek approach.  I am
writing some benchmarks on it, comparing it to a more file-system
based approach like I mentioned in my original post.  I'll report back
when I get results, but it's already way past my bedtime for tonight.

Thanks for all your help and suggestions.



More information about the Python-list mailing list