memory efficient set/dictionary

Josiah Carlson josiah.carlson at sbcglobal.net
Mon Jun 11 02:52:36 EDT 2007


Rafael Darder Calvo wrote:
>> > > Please recommend a module that allows persistent set/dict storage +
>> > > fast query that best fits my problem,
>> >
>> > What is the problem you are trying to solve? How many keys do you have?
>>
>> Corpus processing. There are in the order of billions to tens of
>> billions keys (64bit integers).
>>
> I would recommend you to use a database since it meets your
> requirements (off-memory, fast, persistent). The bsdddb module
> (berkeley db) even gives you a dictionary like interface.
> http://www.python.org/doc/lib/module-bsddb.html

Standard SQL databases can work for this, but generally your 
recommendation of using bsddb works very well for int -> int mappings. 
In particular, I would suggest using a btree, if only because I have had 
troubles in the past with colliding keys in the bsddb.hash (and recno is 
just a flat file, and will attempt to create a file i*(record size) to 
write to record number i .

As an alternative, there are many search-engine known methods for 
mapping int -> [int, int, ...], which can be implemented as int -> int, 
where the second int is a pointer to an address on disk.  Looking into a 
few of the open source search implementations may be worthwhile.

  - Josiah



More information about the Python-list mailing list