Web Crawler - Python or Perl?
Chuck Rhode
CRhode at LacusVeris.com
Thu Jun 12 15:26:59 EDT 2008
On Mon, 09 Jun 2008 10:48:03 -0700, disappearedng wrote:
> I know Python but not Perl, and I am interested in knowing which of
> these two are a better choice.
I'm partial to *Python*, but, the last time I looked, *urllib2* didn't
provide a time-out mechanism that worked under all circumstances. My
client-side scripts would usually hang when the server quit
responding, which happened a lot.
You can get around this by starting an *html* retrieval in its own
thread, giving it a deadline, and killing it if it doesn't finish
gracefully.
A quicker and considerably grittier solution is to supply timeout
parms to the *curl* command through the shell. Execute the command
and retrieve its output through the *subprocess* module.
--
.. Chuck Rhode, Sheboygan, WI, USA
.. 1979 Honda Goldwing GL1000 (Geraldine)
.. Weather: http://LacusVeris.com/WX
.. 64° — Wind SE 5 mph — Sky partly cloudy.
More information about the Python-list
mailing list