[Tutor] managing memory large dictionaries in python
alan.gauld at btinternet.com
Wed Oct 17 02:03:21 CEST 2012
On 16/10/12 17:57, Abhishek Pratap wrote:
> For my problem I need to store 400-800 million 20 characters keys in a
> dictionary and do counting. This data structure takes about 60-100 Gb
> of RAM.
Thats a lot of records but without details of what kind of counting you
plan on we can't give specific advice.
> I am wondering if there are slick ways to map the dictionary to a file
> on disk and not store it in memory but still access it as dictionary
> object. Speed is not the main concern
The trivial solution is to use shelve since that makes a file look like
a dictionary. There are security issues but they don't sound like they'd
be a problem. I've no idea what performance of shelve would be like with
that many records though...
> I did think about databases for this but intuitively it looks like a
> overkill coz for each key you have to first check whether it is
> already present and increase the count by 1 and if not then insert
> the key into dbase.
The database does all of that automatically and fast.
You just need to set it up, load the data and use it - probably around
50 lines of SQL... And you don't need anything fancy for a single table
database - Access, SQLite, even FoxPro...
Or you could just create a big text file and process it line by line if
the data fits that model. Lots of options.
Personally I'd go with a database for speed, flexibility and ease of coding.
Author of the Learn to Program web site
More information about the Tutor