>> Take a look at AWStats (not Python).
> Doesn't this 'only' parse weblogs?  I'd still need some kind of spider
> to tell me all the possible resources available wouldn't I?  It's a
> big website, with 1000s of pages.

If you have pages which are no longer referenced from any root
pages then a spider won't find them.  These dangling pages
are precisely the sort of thing you're trying to remove.   Consider
other options such as looking through the filesystem.

