Fetching websites with Python

William Park opengeometry at yahoo.ca
Wed Mar 31 23:19:55 CEST 2004

Markus Franz <mf at orase.com> wrote:
> Hi.
> How can I grab websites with a command-line python script? I want to start
> the script like this:
> ./script.py ---xxx--- http://www.address1.com http://www.address2.com
> http://www.address3.com
> The script should load these 3 websites (or more if specified) in parallel

In parallel?  Hmm... play around with
    lynx -dump http://... > a1 &
    lynx -dump http://... > a2 &
    lynx -dump http://... > a3 &
    sleep 15
    kill %1 %2 %3
    for i in a1 a2 a3; do
	cat $i
	echo ---xxx---
    rm a1 a2 a3

In serial, the code becomes
    for i in http://... http://...  http://... ; do
	lynx -connect_timeout=15 -dump $i 
	echo ---xxx---

> (may be processes? threads?) and show their contents seperated by ---xxx---.
> The whole output should be print on the command-line. Each website should
> only have 15 seconds to return the contents (maximum) in order to avoid a
> never-ending script.
> How can I do this?
> Thanks.
> Yours sincerely
> Markus Franz

William Park, Open Geometry Consulting, <opengeometry at yahoo.ca>
Linux solution for data processing and document management.

More information about the Python-list mailing list