Memory issues when storing as List of Strings vs List of List

Tim Chase python.list at tim.thechases.com
Tue Nov 30 10:15:47 EST 2010


On 11/30/2010 04:29 AM, OW Ghim Siong wrote:
> a=open("bigfile")
> matrix=[]
> while True:
>      lines = a.readlines(100000000)
>      for line in lines:
>          data=line.split("\t")
>          if several_conditions_are_satisfied:
>              matrix.append(data)
>      print "Number of lines read:", len(lines), "matrix.__sizeof__:",
> matrix.__sizeof__()
>      if len(lines)==0:
>          break

As others have mentiond, don't use .readlines() but use the 
file-object as an iterator instead.  This can even be rewritten 
as a simple list-comprehension:

   from csv import reader
   matrix = [data
     for data
     in reader(file('bigfile.txt', 'rb'), delimiter='\t')
     if several_conditions_are_satisfied(data)
     ]

Assuming that you're throwing away most of the data (the final 
"matrix" fits well within memory, even if the source file doesn't).

-tkc






More information about the Python-list mailing list