Memory problem with Python
josiah.carlson at sbcglobal.net
Mon Jun 18 21:45:30 CEST 2007
Squzer Crawler wrote:
> On Jun 18, 11:06 am, "sor... at gmail.com" <sor... at gmail.com> wrote:
>> On Jun 17, 8:51 pm, Squzer Crawler <Squ... at gmail.com> wrote:
>>> i am developing distributed environment in my college using Python. I
>>> am using therads in client for downloading wepages. Even though i am
>>> reusing the thread, memory usage get increased. I don know why.? I am
>>> using BerkelyDB for URLQueue, BeautifulShop for Parsing the webpages.
>> Isn't the increased memory resulted from storing the already
>> processed pages?
>> Look first at all places where your code instantiates new
>> objects - and make sure you don't keep references to such objects that
>> are not needed anymore.
>> Also, reusing threads has nothing to do with saving memory - but
>> with saving on thread creation time, if I understand your problem
> what about the cyclic reference.. can i use GC in my program..
> if so, please tell me how to implement.. i am calling the gc.collect()
> at the enf of the fetching.. Will it reduce my program speed. Else in
> which way i can call it..?
Garbage collection should happen automatically as long as you are
deleting references to objects you no longer need. If gc.garbage isn't
empty, then you have unbreakable reference cycles. It seems more
likely, as soring at gmail says, that you are keeping copies of the things
you already parsed in memory.
What you can do (if you aren't able to find the bug) is have a wrapper
program that repeatedly starts up your url fetcher via os.system().
Then have your url fetcher close itself down every few hours.
More information about the Python-list