[Tutor] parser recommendations (was Re: Tutor Digest, Vol 142, Issue 11)

bruce badouglas at gmail.com
Mon Dec 14 17:58:54 EST 2015


beautifulsoup, selenium + PhantomJS, and dryscrape

no knowledge of dryscape, never used it.

The other tools/apps are used to handle/parse html/websites.

Ssoup can handle xml/html as well as other input structs. Good for
being able to parse the resulting struct/dom to extract data, or to
change/modify the struct itself.

Selenium is a framework, acting as a browser env, allowing you to
'test' the site/html. It's good for certain uses regarding testing.

Phantomjs/casperjs are exxentially headless broswers, allow you to
also run/parse websites. While Soup is more for static, Phantom
because it's an actual headless browser, allows you to deal with
dynamic sites as well as static.




On Mon, Dec 14, 2015 at 2:56 PM, Alan Gauld <alan.gauld at btinternet.com> wrote:
> On 14/12/15 16:16, Crusier wrote:
>
> Please always supply a useful subject line when replying to the digest
> and also delete all irrelevant text. Some people pay by the byte and we
> have all received these messages already.
>
>> Thank you very much for answering the question. If you don't mind,
>> please kindly let me know which library I should focus on among
>> beautifulsoup, selenium + PhantomJS, and dryscrape.
>
> I don't know anything about the others but Beautiful soup
> is good for html, especially badly written/generated html.
>
> --
> Alan G
> Author of the Learn to Program web site
> http://www.alan-g.me.uk/
> http://www.amazon.com/author/alan_gauld
> Follow my photo-blog on Flickr at:
> http://www.flickr.com/photos/alangauldphotos
>
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor


More information about the Tutor mailing list