python resource management

Philip Semanchuk philip at semanchuk.com
Mon Jan 19 09:30:46 EST 2009


On Jan 19, 2009, at 3:12 AM, S.Selvam Siva wrote:

> Hi all,
>
> I am running a python script which parses nearly 22,000 html files  
> locally
> stored using BeautifulSoup.
> The problem is the memory usage linearly increases as the files are  
> being
> parsed.
> When the script has crossed parsing 200 files or so, it consumes all  
> the
> available RAM and The CPU usage comes down to 0% (may be due to  
> excessive
> paging).
>
> We tried 'del soup_object'  and used 'gc.collect()'. But, no  
> improvement.
>
> Please guide me how to limit python's memory-usage or proper method  
> for
> handling BeautifulSoup object in resource effective manner

You need to figure out where the memory is disappearing. Try  
commenting out parts of your script. For instance, maybe start with a  
minimalist script: open and close the files but don't process them.  
See if the memory usage continues to be a problem. Then add elements  
back in, making your minimalist script more and more like the real  
one. If the extreme memory usage problem is isolated to one component  
or section, you'll find it this way.

HTH
Philip



More information about the Python-list mailing list