Retrieving Info From Web W/ Python

Cameron Laird claird at lairds.com
Fri May 9 12:04:56 EDT 2003


In article <mailman.1052486416.17716.python-list at python.org>,
Laurence Spector <laurence at trdlnk.com> wrote:
>-=-=-=-=-=-
>
>I am new to Python, and I noticed there is a geturl() function that gets
>the contents of a web address. I am trying to get Python to do this,
>except first it has to input a username, password, and then press
>log-in. Then it needs to click another link. And finally, print the web
>page. How do I get Python to "click" links and input information on the
>web in order to get to dynamically generated web pages?
> 
>I'd appreciate if anyone has any ideas on how to make such "web macros"
>with Python. I assume it uses the CGI module, but the instructions only
>seem to indicate how to take data from web pages and create new web
>pages that incorporate it. Thanks,
			.
			.
			.
No.

Yes, this is something Python does.  No, the CGI module is NOT
the direction you want to go, although it certainly has attrac-
ted many of your predecessors along the same path. 

We generally call this "Web scraping"; introductory material ap-
pears at <URL: http://www.unixreview.com/documents/s=7822/ur0302h/ >
I don't think anyone's written a good guide to intermediate topics
in Web scraping with Python; it's regarded as rather a hackerish
topic, and diffuses by word of mouth.  The old Perl Web clients 
book is probably your best bet.

The case at hand sounds like something that an experienced scraper
can automate in twenty minutes, with any luck--perhaps less.  I
haven't figured out a succinct way to express the expertise that
goes into that.  My advice:  try working with lynx and/or cURL
first, to get the concepts down.
-- 

Cameron Laird <Cameron at Lairds.com>
Business:  http://www.Phaseit.net
Personal:  http://phaseit.net/claird/home.html




More information about the Python-list mailing list