Web Crawler - Python or Perl?

Chuck Rhode CRhode at LacusVeris.com
Thu Jun 12 15:26:59 EDT 2008


On Mon, 09 Jun 2008 10:48:03 -0700, disappearedng wrote:

> I know Python but not Perl, and I am interested in knowing which of
> these two are a better choice.

I'm partial to *Python*, but, the last time I looked, *urllib2* didn't
provide a time-out mechanism that worked under all circumstances.  My
client-side scripts would usually hang when the server quit
responding, which happened a lot.  

You can get around this by starting an *html* retrieval in its own
thread, giving it a deadline, and killing it if it doesn't finish
gracefully.

A quicker and considerably grittier solution is to supply timeout
parms to the *curl* command through the shell.  Execute the command
and retrieve its output through the *subprocess* module.

-- 
.. Chuck Rhode, Sheboygan, WI, USA
.. 1979 Honda Goldwing GL1000 (Geraldine)
.. Weather:  http://LacusVeris.com/WX
.. 64° — Wind SE 5 mph — Sky partly cloudy.



More information about the Python-list mailing list