While usually thought of only for testing, I&#39;ve happily used <a href="http://twill.idyll.org/commands.html">twill</a> for the authentication/cookie/form-handling portion then beautifulsoup for the parsing. Twill can be configured to use beautifulsoup directly but with direct access to the underlying page, you can use any parsing library you like.<div>

<br></div><div>PeterB</div><div><br><div class="gmail_quote">On Sat, Jun 25, 2011 at 1:42 PM, Stephen McInerney <span dir="ltr">&lt;<a href="mailto:spmcinerney@hotmail.com">spmcinerney@hotmail.com</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


<div><div dir="ltr">

<br>What do people use for scraping on a website requiring (login form-based) authentication?<br><ul><li>BeautifulSoup: does not handle authentication or cookies</li><li>Scrapy: does but more heavyweight paradigm to learn, incl. XPath</li>

</ul><br>Some discussion: <a href="http://stackoverflow.com/questions/4328271/best-way-for-a-beginner-to-learn-screen-scraping-with-python" target="_blank">http://stackoverflow.com/questions/4328271/best-way-for-a-beginner-to-learn-screen-scraping-with-python</a><br>

<br>Thanks,<br>Stephen<br><br>                                               </div></div>

<br>_______________________________________________<br>

Baypiggies mailing list<br>

<a href="mailto:Baypiggies@python.org">Baypiggies@python.org</a><br>

To change your subscription options or unsubscribe:<br>

<a href="http://mail.python.org/mailman/listinfo/baypiggies" target="_blank">http://mail.python.org/mailman/listinfo/baypiggies</a><br></blockquote></div><br><br clear="all"><br>-- <br><div><br></div>peter.borocz at gmail dot com<br>


</div>