Memory issues when storing as List of Strings vs List of List
Tim Chase
python.list at tim.thechases.com
Tue Nov 30 10:15:47 EST 2010
On 11/30/2010 04:29 AM, OW Ghim Siong wrote:
> a=open("bigfile")
> matrix=[]
> while True:
> lines = a.readlines(100000000)
> for line in lines:
> data=line.split("\t")
> if several_conditions_are_satisfied:
> matrix.append(data)
> print "Number of lines read:", len(lines), "matrix.__sizeof__:",
> matrix.__sizeof__()
> if len(lines)==0:
> break
As others have mentiond, don't use .readlines() but use the
file-object as an iterator instead. This can even be rewritten
as a simple list-comprehension:
from csv import reader
matrix = [data
for data
in reader(file('bigfile.txt', 'rb'), delimiter='\t')
if several_conditions_are_satisfied(data)
]
Assuming that you're throwing away most of the data (the final
"matrix" fits well within memory, even if the source file doesn't).
-tkc
More information about the Python-list
mailing list