Create an index from a webpage [RANT, DNFTT]

Steven D'Aprano steve+comp.lang.python at pearwood.info
Thu Sep 8 22:14:55 EDT 2011


Simon Cropper wrote:

> 1. Being told to google-it when I explicitly stated in my initial post
> that I had been doing this and had not been able to find anything is
> just plain rude. It is unconstructive and irritating.

Why so you did. Even though I wasn't the one who told you to google it, I'll
apologise too because I was thinking the same thing. Sorry about that.


> 3. Some searches, particularly for common terms throw millions of hits.
> 'Python' returns 147,000,000 results on google, 'Sitemap' returns
> 1,410,000,000 results. Even 'Python AND Sitemap' still returns 5,020
> results. 

How about "python generate a site map"? The very first link on DuckDuckGo is
this:

http://www.conversationmarketing.com/2010/08/python-sitemap-crawler-1.htm

Despite the domain, there is actual Python code on the page. Unfortunately
it looks like crappy code with broken formatting and a mix of <\br> tags,
but it's a start.

Searching for "site map" on PyPI returns a page full of hits:

http://pypi.python.org/pypi?%3Aaction=search&term=site+map&submit=search

Most of them seem to rely on a framework like Django etc, but you might find
something useful.


> 4. AND YES, I could write a program but why recreate code when there is
> a strong likelihood that code already exists.

"Strong" likelihood? Given how hard it is to find an appropriate sitemap
generator written in Python, I'd say there is a strong likelihood that one
that meets your needs and is publicly available under an appropriate
licence is vanishingly small.

If you do decide to write your own, please consider uploading it to PyPI
under a FOSS licence.



-- 
Steven




More information about the Python-list mailing list