How to organize substite function with side effects

Sin Hang Kin kentsin at
Mon Jun 18 16:13:14 CEST 2001

The following code download the file from web, and remove missing <img> tags
from it.

Since it need to keep track of the link queue, baseurl and localstorage, I
decide to warp it in a class:

class urlgraber:
   def __init__(self, baseurl, localpath):
      self.baseurl = baseurl
      self.urlq = []
      self.localpath = localpath

   def retrieveimage(self, url):
             (fn, hdr) = urllib.urlretrieve(self.baseurl+url,
             return 1
             return 0
      # retrieve the image, return true if the retrieval success

    def downloadpage(self, url):
       # download page
        # now, use re.sub to subst images:
        re.sub(imagepat, substfunc, page, 0)

The idea is that substfunc is called with the <img ....>. it then extract
the url, call retrieve image, and return either <img src=localpath> or <!--
img --> as result.

The problem is, it is very confuse to get the localpath and baseurl between

Can somebody help me re-organize this? What is the proper way to do this in


Kent Sin
kentsin at

