Fwd: [Tutor] html programming ???

Jeff Shannon jeff@ccvcorp.com
Wed, 03 Oct 2001 15:03:18 -0700


> From: Samir Patel <sampatel@cs.rmit.edu.au>
>
> hi Jeff shannon,
> thanx for your help and i actually implement the things u told me to do
> and also i have tried to add  links which i am getting to a Queue ....
>
> now another problem of mine is that i have to call this function
> recursively becoz i want to find links inside the links i have already
> found and this to a depth of 3 so how does recursion work in python ????
>

Recursion is pretty simple--you just call a function from within itself.  The main thing to watch for is that you really do have something that will limit the depth of the recursion.  Python has a limit to how deep it can recurse, but 3
levels deep is no problem at all.  In this case, I'd use the findlinks() function we've created, as-is, and wrap that into a recursive function, something like this:

def scanpage(url, depth=3, maxindent=None):
    linklist = findlinks(url)

    if maxindent == None:
        maxindent = depth
    indent = "  " * (maxindent - depth)   # this will let us progressively indent printouts from deeper levels
    print "%s URL -> %s:" % (indent, url)
    if depth > 1:
        for link in linklist:
            scanpage(link, depth - 1, maxindent)

Note that this function doesn't save the found links for later use, it just prints them.  If you want to save them, you'll need to find some way of keeping track of them.  If you don't care which page a given link came from, you can just
append them all to some global list.  If you *do* care, then you could probably find some way to use nested lists, or tuples of (link, source page), or perhaps a dictionary using links as keys and source page as values, or... any number
of different and potentially arbitrarily complicated schemes.  :)  Assuming that this simple form is fine, you can use it like so:

scanpage("www.somedomain.com/index.html")

This will run the function using the default depth of 3.  It will then call itself on each link it find, but use a depth of 2.  Each of *those* links will be called using a depth of 1.  The links found at that depth will then *not* be
searched further.  Note that if you decide that you don't want to use a depth of 3, you can call directly with whatever depth you like, such as:

scanpage("www.somedomain.com/index.html", 5)

Hope that helps...


Jeff Shannon
Technician/Programmer
Credit International