Creating Long Lists

Kelson Zawack zawackkfb at gis.a-star.edu.sg
Mon Feb 21 21:57:25 EST 2011


I have a large (10gb) data file for which I want to parse each line into 
an object and then append this object to a list for sorting and further 
processing.  I have noticed however that as the length of the list 
increases the rate at which objects are added to it decreases 
dramatically.  My first thought was that  I was nearing the memory 
capacity of the machine and the decrease in performance was due to the 
os swapping things in and out of memory.  When I looked at the memory 
usage this was not the case.  My process was the only job running and 
was consuming 40gb of the the total 130gb and no swapping processes were 
running.  To make sure there was not some problem with the rest of my 
code, or the servers file system, I ran my program again as it was but 
without the line that was appending items to the list and it completed 
without problem indicating that the decrease in performance is the 
result of some part of the process of appending to the list.  Since 
other people have observed this problem as well 
(http://tek-tips.com/viewthread.cfm?qid=1096178&page=13,  
http://stackoverflow.com/questions/2473783/is-there-a-way-to-circumvent-python-list-append-becoming-progressively-slower-i) 
I did not bother to further analyze or benchmark it.  Since the answers 
in the above forums do not seem very definitive  I thought  I would 
inquire here about what the reason for this decrease in performance is, 
and if there is a way, or another data structure, that would avoid this 
problem.




More information about the Python-list mailing list