Fetching websites with Python
opengeometry at yahoo.ca
Wed Mar 31 23:19:55 CEST 2004
Markus Franz <mf at orase.com> wrote:
> How can I grab websites with a command-line python script? I want to start
> the script like this:
> ./script.py ---xxx--- http://www.address1.com http://www.address2.com
> The script should load these 3 websites (or more if specified) in parallel
In parallel? Hmm... play around with
lynx -dump http://... > a1 &
lynx -dump http://... > a2 &
lynx -dump http://... > a3 &
kill %1 %2 %3
for i in a1 a2 a3; do
rm a1 a2 a3
In serial, the code becomes
for i in http://... http://... http://... ; do
lynx -connect_timeout=15 -dump $i
> (may be processes? threads?) and show their contents seperated by ---xxx---.
> The whole output should be print on the command-line. Each website should
> only have 15 seconds to return the contents (maximum) in order to avoid a
> never-ending script.
> How can I do this?
> Yours sincerely
> Markus Franz
William Park, Open Geometry Consulting, <opengeometry at yahoo.ca>
Linux solution for data processing and document management.
More information about the Python-list