Populating huge data structures from disk

Neil Cerutti horpner at yahoo.com
Tue Nov 6 22:43:26 CET 2007


On 2007-11-06, Michael Bacarella <mbac at gpshopper.com> wrote:
> And there's no solace in lists either:
>  
> $ time python eat800.py 
>
> real    4m2.796s
> user    3m57.865s
> sys     0m3.638s
>
> $ cat eat800.py 
> #!/usr/bin/python
>
> import struct
>
> d = []
> f = open('/dev/zero')
> for i in xrange(100000000):
>         d.append(struct.unpack('L',f.read(8))[0])
>
>
> cPickle with protocol 2 has some promise but is more complicated because
> arrays can't be pickled.  In a perfect world I could do something like this
> somewhere in the backroom:
>
> x = lengthy_number_crunching()
> magic.save_mmap("/important-data")
>
> and in the application do...
>
> x = magic.mmap("/important-data")
> magic.mlock("/important-data")
>
> and once the mlock finishes bringing important-data into RAM, at
> the speed of your disk I/O subsystem, all accesses to x will be
> hits against RAM.
>  
>
> Any thoughts?

Disable the garbage collector, use a while loop and manual index
instead of an iterator, preallocate your list, e.g.,
[None]*100000000, and hope they don't have blasters!

-- 
Neil Cerutti



More information about the Python-list mailing list