python resource management

> On Jan 19, 2009, at 3:12 AM, S.Selvam Siva wrote:
>> Hi all,
>> I am running a python script which parses nearly 22,000 html files 
>> locally
>> stored using BeautifulSoup.
>> The problem is the memory usage linearly increases as the files are 
>> being
>> parsed.
>> When the script has crossed parsing 200 files or so, it consumes all  the
>> available RAM and The CPU usage comes down to 0% (may be due to 
>> excessive
>> paging).
>> We tried 'del soup_object'  and used 'gc.collect()'. But, no 
>> improvement.
>> Please guide me how to limit python's memory-usage or proper method  for
>> handling BeautifulSoup object in resource effective manner
> You need to figure out where the memory is disappearing. Try  commenting 
> out parts of your script. For instance, maybe start with a  minimalist 
> script: open and close the files but don't process them.  See if the 
> memory usage continues to be a problem. Then add elements  back in, making 
> your minimalist script more and more like the real  one. If the extreme 
> memory usage problem is isolated to one component  or section, you'll find 
> it this way.
> Philip

Also, are you creating a separate soup object for each file or reusing one 
object over and over?

