[Tutor] managing memory large dictionaries in python
wrw at mac.com
wrw at mac.com
Sat Oct 20 03:03:27 CEST 2012
On Oct 16, 2012, at 12:57 PM, Abhishek Pratap <abhishek.vit at gmail.com> wrote:
> Hi Guys
> For my problem I need to store 400-800 million 20 characters keys in a
> dictionary and do counting. This data structure takes about 60-100 Gb
> of RAM.
> I am wondering if there are slick ways to map the dictionary to a file
> on disk and not store it in memory but still access it as dictionary
> object. Speed is not the main concern in this problem and persistence
> is not needed as the counting will only be done once on the data. We
> want the script to run on smaller memory machines if possible.
> I did think about databases for this but intuitively it looks like a
> overkill coz for each key you have to first check whether it is
> already present and increase the count by 1 and if not then insert
> the key into dbase.
> Just want to take your opinion on this.
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
With apologies for coming to this late. Just a couple of comments -
1) If the keys were sorted before the counting operation was to take place, the problem could be run in successive batches in as small a foot print as desired.
2) Sorting in limited memory space was a highly developed art back in the early days of IBM mainframe computing (when one had to use tape rather than disk for temp store and 64k, yes "k", was a lot of memory). If you look at Knuth, Volume III: Sorting and Searching you will find several algorithms that would allow the sort to also run in as small a foot print as needed.
Possibly not interesting, but just saying…
More information about the Tutor