help!! *extra* tricky web page to extract data from...
steve at holdenweb.com
Wed Mar 14 01:13:58 CET 2007
Paul Rubin wrote:
> "Diez B. Roggisch" <deets at nospam.web.de> writes:
>> Obviously this wouldn't really help, as you can't predict what a
>> website actually wants which events, in possibly which
>> order. Especially if the site does not _want_ to be scrapable- think
>> of a simple "click on the images in the order of the numbers shown on
>> them" captcha.
> Sure, but most sites don't go to such lengths, and even captchas can
> be defeated if you're trying to scrape a specific site and are willing
> to spend effort on the particular captcha generator that it uses.
> Plus there is always www.captchasolver.com (!).
I especially like the rems and conditions they ask you to acknowledge if
you want to sign up as a worker:
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Blog of Note: http://holdenweb.blogspot.com
See you at PyCon? http://us.pycon.org/TX2007
More information about the Python-list