[Tutor] managing memory large dictionaries in python

Alan Gauld alan.gauld at btinternet.com
Wed Oct 17 02:03:21 CEST 2012


On 16/10/12 17:57, Abhishek Pratap wrote:

> For my problem I need to store 400-800 million 20 characters keys in a
> dictionary and do counting. This data structure takes about 60-100 Gb
> of RAM.

Thats a lot of records but without details of what kind of counting you 
plan on we can't give specific advice.

> I am wondering if there are slick ways to map the dictionary to a file
> on disk and not store it in memory but still access it as dictionary
> object. Speed is not the main concern

The trivial solution is to use shelve since that makes a file look like 
a dictionary. There are security issues but they don't sound like they'd 
be a problem. I've no idea what performance of shelve would be like with 
that many records though...

> I did think about databases for this but intuitively it looks like a
> overkill coz for each key you have to first check whether it is
> already present and increase the count by 1  and if not then insert
> the key into dbase.

The database does all of that automatically and fast.

You just need to set it up, load the data and use it - probably around 
50 lines of SQL... And you don't need anything fancy for a single table 
database - Access, SQLite, even FoxPro...

Or you could just create a big text file and process it line by line if 
the data fits that model. Lots of options.

Personally I'd go with a database for speed, flexibility and ease of coding.


-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/



More information about the Tutor mailing list