Fetching websites with Python

William Park opengeometry at yahoo.ca
Wed Mar 31 23:19:55 CEST 2004


Markus Franz <mf at orase.com> wrote:
> Hi.
> 
> How can I grab websites with a command-line python script? I want to start
> the script like this:
> 
> ./script.py ---xxx--- http://www.address1.com http://www.address2.com
> http://www.address3.com
> 
> The script should load these 3 websites (or more if specified) in parallel

In parallel?  Hmm... play around with
    lynx -dump http://... > a1 &
    lynx -dump http://... > a2 &
    lynx -dump http://... > a3 &
    sleep 15
    kill %1 %2 %3
    for i in a1 a2 a3; do
	cat $i
	echo ---xxx---
    done
    rm a1 a2 a3

In serial, the code becomes
    for i in http://... http://...  http://... ; do
	lynx -connect_timeout=15 -dump $i 
	echo ---xxx---
    done

> (may be processes? threads?) and show their contents seperated by ---xxx---.
> The whole output should be print on the command-line. Each website should
> only have 15 seconds to return the contents (maximum) in order to avoid a
> never-ending script.
> 
> How can I do this?
> 
> Thanks.
> 
> Yours sincerely
> 
> Markus Franz
> 
> 

-- 
William Park, Open Geometry Consulting, <opengeometry at yahoo.ca>
Linux solution for data processing and document management.



More information about the Python-list mailing list