Writing huge Sets() to disk
mmokrejs at ribosome.natur.cuni.cz
Mon Jan 10 11:41:52 EST 2005
Robert Brewer wrote:
> Martin MOKREJŠ wrote:
>> I have sets.Set() objects having up to 20E20 items,
>>each is composed of up to 20 characters. Keeping
>>them in memory on !GB machine put's me quickly into swap.
>>I don't want to use dictionary approach, as I don't see a sense
>>to store None as a value. The items in a set are unique.
>> How can I write them efficiently to disk?
> got shelve*?
I know about shelve, but doesn't it work like a dictionary?
Why should I use shelve for this? Then it's faster to use
bsddb directly and use string as a key and None as a value, I'd guess.
Even for that, note that even for data contained in _set11,
the index should be(could be) optimized for keysize 11.
There are no other record-sizes.
Similarly, _set15 has all keys of size 15. In the bsddb or anydbm
and other modules docs, I don't see how to optimize that. Without
this optimization, I think it would be even slower. And shelve
gives me exactly such, unoptimized, general index on dictionary.
Maybe I'm wrong, I'm just a beginner here.
More information about the Python-list