Create an index from a webpage [RANT, DNFTT]
simoncropper at fossworkflowguides.com
Thu Sep 8 22:09:58 EDT 2011
On 09/09/11 10:32, Rhodri James wrote:
> On Fri, 09 Sep 2011 00:40:42 +0100, Simon Cropper
> Ahem. You should expect a certain amount of ribbing after admitting that
> your Google-fu is weak. So is mine, but hey.
I did not admit anything. I consider my ability to find this quite good
actually. Others assumed that my "Google-fu is weak".
>> 4. If someone is willing to help me, rather than lecture me (or poke
>> me to see if they get a response), I would appreciate it.
> The Google Python Sitemap Generator
> fourth offering when you google "map a website with Python") looks like
> a promising start. At least it produces something in XML -- filtering
> that and turning it into HTML should be fairly straightforward.
I saw this in my original search. My conclusions were..
1. The last update was in 2005. That is 6 years ago. In that time we
have had numerous upgrades to HTML, Logs, etc.
2. The script expects to run on the webserver. I don't have the ability
to run python on my webserver.
3. There are also a number of dead-links and redirects to Google
Webmaster Central / Tools, which then request you submit a sitemap (as I
alluded we get into a circular confusing cross-referencing situation)
4. The ultimate product - if you can get the package to work - would be
a XML file you would need to massage to extract what you needed.
To me this seems like overkill.
I assume you could import the parent html file, scrap all the links on
the same domain, dump these to a hierarchical list and represent this in
HTML using BeautifulSoup or something similar. Certainly doable but
considering the shear commonality of this task I don't understand why a
simple script does not already exist - hence my original request for
It would appear from the feedback so far this 'forum' is not the most
appropriate to ask this question. Consequently, I will take your advice
and keep looking... and if I don't find something within a reasonable
time frame, just write something myself.
Simon Cropper - Open Content Creator / Website Administrator
Free and Open Source Software Workflow Guides
GIS Packages http://gis.fossworkflowguides.com
bash / Python http://scripting.fossworkflowguides.com
More information about the Python-list