Populating huge data structures from disk

Michael Bacarella mbac at gpshopper.com
Tue Nov 6 13:18:45 EST 2007


For various reasons I need to cache about 8GB of data from disk into core on
application startup. 

Building this cache takes nearly 2 hours on modern hardware.  I am surprised
to discover that the bottleneck here is CPU.

 

The reason this is surprising is because I expect something like this to be
very fast:

 

#!python

 

import array

a = array.array('L')

f = open('/dev/zero','r')

while True:

    a.fromstring(f.read(8))

 

 

Profiling this application shows all of the time is spent inside
a.fromstring.

Little difference if I use list instead of array.

 

Is there anything I could tell the Python runtime to help it run this
pathologically slanted case faster?

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20071106/9395a358/attachment.html>


More information about the Python-list mailing list