[Baypiggies] Scraping with authentication: Scrapy vs BeautifulSoup?
glen at glenjarvis.com
Sun Jun 26 03:48:54 CEST 2011
Beautiful soup really just parses the HTML. It doesn't (have to) retrieve the page for you.
You can use the built-in httplib2, urllib libraries to retrieve the page (also with authentication) and then use BeautifulSoup to parse the page.
On Jun 25, 2011, at 1:42 PM, Stephen McInerney <spmcinerney at hotmail.com> wrote:
> What do people use for scraping on a website requiring (login form-based) authentication?
> BeautifulSoup: does not handle authentication or cookies
> Scrapy: does but more heavyweight paradigm to learn, incl. XPath
> Some discussion: http://stackoverflow.com/questions/4328271/best-way-for-a-beginner-to-learn-screen-scraping-with-python
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Baypiggies