[Baypiggies] Scraping with authentication: Scrapy vs BeautifulSoup?

Glen Jarvis glen at glenjarvis.com
Sun Jun 26 03:48:54 CEST 2011


Stephen,
    Beautiful soup really just parses the HTML. It doesn't (have to) retrieve the page for you.

    You can use the built-in httplib2, urllib libraries to retrieve the page (also with authentication) and then use BeautifulSoup to parse the page.

Cheers,


Glen

On Jun 25, 2011, at 1:42 PM, Stephen McInerney <spmcinerney at hotmail.com> wrote:

> 
> What do people use for scraping on a website requiring (login form-based) authentication?
> BeautifulSoup: does not handle authentication or cookies
> Scrapy: does but more heavyweight paradigm to learn, incl. XPath
> 
> Some discussion: http://stackoverflow.com/questions/4328271/best-way-for-a-beginner-to-learn-screen-scraping-with-python
> 
> Thanks,
> Stephen
> 
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110625/d298d302/attachment.html>


More information about the Baypiggies mailing list