python resource management
Philip Semanchuk
philip at semanchuk.com
Mon Jan 19 09:30:46 EST 2009
On Jan 19, 2009, at 3:12 AM, S.Selvam Siva wrote:
> Hi all,
>
> I am running a python script which parses nearly 22,000 html files
> locally
> stored using BeautifulSoup.
> The problem is the memory usage linearly increases as the files are
> being
> parsed.
> When the script has crossed parsing 200 files or so, it consumes all
> the
> available RAM and The CPU usage comes down to 0% (may be due to
> excessive
> paging).
>
> We tried 'del soup_object' and used 'gc.collect()'. But, no
> improvement.
>
> Please guide me how to limit python's memory-usage or proper method
> for
> handling BeautifulSoup object in resource effective manner
You need to figure out where the memory is disappearing. Try
commenting out parts of your script. For instance, maybe start with a
minimalist script: open and close the files but don't process them.
See if the memory usage continues to be a problem. Then add elements
back in, making your minimalist script more and more like the real
one. If the extreme memory usage problem is isolated to one component
or section, you'll find it this way.
HTH
Philip
More information about the Python-list
mailing list