Create an index from a webpage [RANT, DNFTT]

Simon Cropper simoncropper at
Thu Sep 8 22:09:58 EDT 2011

On 09/09/11 10:32, Rhodri James wrote:
> On Fri, 09 Sep 2011 00:40:42 +0100, Simon Cropper
> Ahem. You should expect a certain amount of ribbing after admitting that
> your Google-fu is weak. So is mine, but hey.

I did not admit anything. I consider my ability to find this quite good 
actually. Others assumed that my "Google-fu is weak".

>> 4. If someone is willing to help me, rather than lecture me (or poke
>> me to see if they get a response), I would appreciate it.
> The Google Python Sitemap Generator
> (,
> fourth offering when you google "map a website with Python") looks like
> a promising start. At least it produces something in XML -- filtering
> that and turning it into HTML should be fairly straightforward.

I saw this in my original search. My conclusions were..

1. The last update was in 2005. That is 6 years ago. In that time we 
have had numerous upgrades to HTML, Logs, etc.
2. The script expects to run on the webserver. I don't have the ability 
to run python on my webserver.
3. There are also a number of dead-links and redirects to Google 
Webmaster Central / Tools, which then request you submit a sitemap (as I 
alluded we get into a circular confusing cross-referencing situation)
4. The ultimate product - if you can get the package to work - would be 
a XML file you would need to massage to extract what you needed.

To me this seems like overkill.

I assume you could import the parent html file, scrap all the links on 
the same domain, dump these to a hierarchical list and represent this in 
HTML using BeautifulSoup or something similar. Certainly doable but 
considering the shear commonality of this task I don't understand why a 
simple script does not already exist - hence my original request for 

It would appear from the feedback so far this 'forum' is not the most 
appropriate to ask this question. Consequently, I will take your advice 
and keep looking... and if I don't find something within a reasonable 
time frame, just write something myself.

Cheers Simon

    Simon Cropper - Open Content Creator / Website Administrator

    Free and Open Source Software Workflow Guides
    GIS Packages     
    bash / Python

More information about the Python-list mailing list