[Tutor] Unable to download <th>, <td> using Beautifulsoup

bruce badouglas at gmail.com
Fri Jul 29 20:32:40 EDT 2016


Hey Alan...

Wow APIs.. yeah.. would be cool!!!

I've worked on scraping data from lots of public sites, that have no issue
(as long as you're kind) that have no clue/resource regarding offerning
APIs.

However, yeah, if you''re looking to "rip" off a site that has adverts,
prob not a cool thing to do, no matter what tools are used.



On Fri, Jul 29, 2016 at 6:59 PM, Alan Gauld via Tutor <tutor at python.org>
wrote:

> On 29/07/16 23:10, bruce wrote:
>
> > The most "complete" is the use of a headless browser. However, the
> > use/implementation of a headless browser has its' own share of issues.
> > Speed, complexity, etc...
>
> Walter and Bruce have jumped ahead a few steps from where I was
> heading but basically it's an increasingly common scenario where
> web pages are no longer primarily html but rather are
> Javascript programs that fetch data dynamically.
>
> A headless browser is the brute force way to deal with such issues
> but a better (purer?) way is to access the same API that the browser
> is using. Many web sites now publish RESTful APIs with web
> services that you can call directly. It is worth investigating
> whether your target has this. If so that will generally provide
> a much nicer solution than trying to drive a headless browser.
>
> Finally you need to consider whether you have the right to the
> data without running a browser? Many sites provide information
> for free but get paid by adverts. If you bypass the web screen
> (adverts) you  bypass their revenue and they do not allow that.
> So you need to be sure that you are legally entitled to scrape
> data from the site or use an API.
>
> Otherwise you may be on the wrong end of a law suite, or at
> best be contributing to the demise of the very site you are
> trying to use.
>
> --
> Alan G
> Author of the Learn to Program web site
> http://www.alan-g.me.uk/
> http://www.amazon.com/author/alan_gauld
> Follow my photo-blog on Flickr at:
> http://www.flickr.com/photos/alangauldphotos
>
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>


More information about the Tutor mailing list