[Baypiggies] Scraping with authentication: Scrapy vs BeautifulSoup?

Dwight Hubbard dwight_hubbard at yahoo.com
Tue Jun 28 00:07:40 CEST 2011


For scraping with authentication I find the twill module is very good.



>________________________________
>From: Glen Jarvis <glen at glenjarvis.com>
>To: Stephen McInerney <spmcinerney at hotmail.com>
>Cc: "<baypiggies at python.org>" <baypiggies at python.org>
>Sent: Saturday, June 25, 2011 6:48 PM
>Subject: Re: [Baypiggies] Scraping with authentication: Scrapy vs BeautifulSoup?
>
>
>Stephen,
>    Beautiful soup really just parses the HTML. It doesn't (have to) retrieve the page for you.
>
>
>    You can use the built-in httplib2, urllib libraries to retrieve the page (also with authentication) and then use BeautifulSoup to parse the page.
>
>Cheers,
>
>
>
>
>Glen
>
>On Jun 25, 2011, at 1:42 PM, Stephen McInerney <spmcinerney at hotmail.com> wrote:
>
>
>
>>What do people use for scraping on a website requiring (login form-based) authentication?
>>
>>	* BeautifulSoup: does not handle authentication or cookies
>>	* Scrapy: does but more heavyweight paradigm to learn, incl. XPath
>>Some discussion: http://stackoverflow.com/questions/4328271/best-way-for-a-beginner-to-learn-screen-scraping-with-python
>>
>>Thanks,
>>Stephen
>>
>>
>_______________________________________________
>>Baypiggies mailing list
>>Baypiggies at python.org
>>To change your subscription options or unsubscribe:
>>http://mail.python.org/mailman/listinfo/baypiggies
>_______________________________________________
>Baypiggies mailing list
>Baypiggies at python.org
>To change your subscription options or unsubscribe:
>http://mail.python.org/mailman/listinfo/baypiggies
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110627/8d11ba01/attachment.html>


More information about the Baypiggies mailing list