4DOM eating all my memory
ewan
frimn at hotmail.com
Sun Feb 1 01:34:02 EST 2004
hello all -
I'm looping over a set of urls pulled from a database, fetching the
corresponding webpage, and building a DOM tree for it using
xml.dom.ext.reader.HtmlLib (then trying to match titles in a web library
catalogue). all the trees seem to be kept in memory,
however, when I get through fifty or so iterations the program has used
about half my memory and slowed the system to a crawl.
tried turning on all gc debugging flags. they produce lots of output, but it
all says 'collectable' - sounds fine to me.
I even tried doing gc.collect() at the end of every iteration. nothing.
everything seems to be being collected. so why does each iteration increase
the memory usage by several megabytes?
below is some code (and by the way, do I have those 'global's in the right
places?)
any suggestions would be appreciated immeasurably...
ewan
import MySQLdb
...
cursor = db.cursor()
result = cursor.execute("""SELECT CALLNO, TITLE FROM %s""" % table)
rows = cursor.fetchall()
cursor.close()
for row in rows:
current_callno = row[0]
title = row[1]
url = construct_url(title)
cf = callno_finder()
cf.find(title.decode('latin-1'), url)
...
(meanwhile, in another file)
...
class callno_finder:
def __init__(self):
global root
root = None
def find(self, title, uri):
global root
reader = HtmlLib.Reader()
root = reader.fromUri(uri)
# find what we're looking for
...
More information about the Python-list
mailing list