Cache a large list to disk

Radovan Garabik garabik at kassiopeia.juls.savba.sk
Tue May 18 02:15:57 EDT 2004


Chris <iamlevis3 at hotmail.com> wrote:
> I have a set of routines, the first of which reads lots and lots of
> data from disparate regions of disk.  This read routine takes 40
> minutes on a P3-866 (with IDE drives).  This routine populates an
> array with a number of dictionaries, e.g.,
> 
> [{'el2': 0, 'el3': 0, 'el1': 0, 'el4': 0, 'el5': 0},
> {'el2': 15, 'el3': 21, 'el1': 9, 'el4': 33, 'el5': 51},
> {'el2': 35, 'el3': 49, 'el1': 21, 'el4': 77, 'el5': 119},
> {'el2': 45, 'el3': 63, 'el1': 27, 'el4': 99, 'el5': 153}]
>        (not actually the data i'm reading)
> 
> This information is acted upon by subsequent routines.  These routines
> change very often, but the data changes very infrequently (the
> opposite pattern of what I'm used to).  This data changes once per
> week, so I can safely cache this data to a big file on disk, and read
> out of this big file -- rather than having to read about 10,000 files
> -- when the program is loaded.
> 
> Now, if this were C I'd know how to do this in a pretty
> straightforward manner.  But being new to Python, I don't know how I
> can (hopefully easily) write this data to a file, and then read it out
> into memory on subsequent launches.
> 
> If anyone can provide some pointers, or even some sample code on how
> to accomplish this, it would be greatly appreciated.

as already mentioned, use cPickle or shelve
However, depending how big and how many your dictionaries are,
you can use *dbm databases instead of dictionaries, with numbers
packed up using struct module (I found out it is sometimes much 
efficient than using shelve).
Looking at your sample, you could even reorganize the data as:
{'el2': [0, 15, 35, 45],
 'el3': [0, 21, 49, 63],
 ...
}
and use one big dbm database, with lists represented as array objects - 
that is going to give you major memory efficiency boost.

If the arrays are going to be big (like really BIG, of some tens
of megabytes), you can store them one per file, and use mmap
to access them - I am doing now something similar


-- 
 -----------------------------------------------------------
| Radovan Garabík http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__    garabik @ kassiopeia.juls.savba.sk     |
 -----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!



More information about the Python-list mailing list