memory efficient set/dictionary
Josiah Carlson
josiah.carlson at sbcglobal.net
Mon Jun 11 02:52:36 EDT 2007
Rafael Darder Calvo wrote:
>> > > Please recommend a module that allows persistent set/dict storage +
>> > > fast query that best fits my problem,
>> >
>> > What is the problem you are trying to solve? How many keys do you have?
>>
>> Corpus processing. There are in the order of billions to tens of
>> billions keys (64bit integers).
>>
> I would recommend you to use a database since it meets your
> requirements (off-memory, fast, persistent). The bsdddb module
> (berkeley db) even gives you a dictionary like interface.
> http://www.python.org/doc/lib/module-bsddb.html
Standard SQL databases can work for this, but generally your
recommendation of using bsddb works very well for int -> int mappings.
In particular, I would suggest using a btree, if only because I have had
troubles in the past with colliding keys in the bsddb.hash (and recno is
just a flat file, and will attempt to create a file i*(record size) to
write to record number i .
As an alternative, there are many search-engine known methods for
mapping int -> [int, int, ...], which can be implemented as int -> int,
where the second int is a pointer to an address on disk. Looking into a
few of the open source search implementations may be worthwhile.
- Josiah
More information about the Python-list
mailing list