[Tutor] Grabbing data from changing website

Sean Novak snovak at snovak.com
Fri Jun 6 14:28:29 CEST 2008


I've recently been writing a web app with libxml2dom (  http://www.boddie.org.uk/python/libxml2dom.html 
  ).  I had a look at BeautifulSoup and found the two very similar.  I  
ended up sticking with libxml2dom because of a quote from its  
website...... "Performance is fairly respectable since libxml2dom  
makes direct use of libxml2mod - the low-level wrapping of libxml2 for  
Python.".....  I figured the app might parse through a little faster.   
I guess the only way to tell is to benchmark the two against  
eachother.  Does anyone have input on defining differences?  Reasons  
to use one over the other?  Opinions welcome.
On Jun 5, 2008, at 3:03 PM, Tony Cappellini wrote:

>
>
> ------------------------------
>
> Message: 4
> Date: Wed, 4 Jun 2008 10:00:46 -0400
> From: James <jtp at nc.rr.com>
> Subject: [Tutor] Grabbing data from changing website
> To: tutor at python.org
> Message-ID:
>        <e107b4ff0806040700o33b1f221y2d8a6ba24ed9d55e at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
>
> >>urllib2 will grab the HTML. BeautifulSoup will parse it and allow
> >>fairly easy access. My writeup on each:
>
> I'll second Kent's vote for BeautifulSoup.
> I had never done any web programming, but using BS I quickly wrote a  
> small program that downloads an image from a site.
> The image changes daily, and the filename & directory are obviously  
> unique. BS made it very easy.
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20080606/39e1d802/attachment.htm>


More information about the Tutor mailing list