[Baypiggies] Scraping with authentication: Scrapy vs BeautifulSoup?

Peter Borocz peter.borocz at gmail.com
Sat Jun 25 23:38:27 CEST 2011


While usually thought of only for testing, I've happily used
twill<http://twill.idyll.org/commands.html>for the
authentication/cookie/form-handling portion then beautifulsoup for
the parsing. Twill can be configured to use beautifulsoup directly but with
direct access to the underlying page, you can use any parsing library you
like.

PeterB

On Sat, Jun 25, 2011 at 1:42 PM, Stephen McInerney
<spmcinerney at hotmail.com>wrote:

>
> What do people use for scraping on a website requiring (login form-based)
> authentication?
>
>    - BeautifulSoup: does not handle authentication or cookies
>    - Scrapy: does but more heavyweight paradigm to learn, incl. XPath
>
>
> Some discussion:
> http://stackoverflow.com/questions/4328271/best-way-for-a-beginner-to-learn-screen-scraping-with-python
>
> Thanks,
> Stephen
>
>
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies
>



-- 

peter.borocz at gmail dot com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110625/7b881732/attachment.html>


More information about the Baypiggies mailing list