[Tutor] HTML Parsing
Danny Yoo
dyoo at hashcollision.org
Wed May 28 19:49:26 CEST 2014
> I am using Python 3.3.3 on Windows 7. I would like to know what is the best
> method to do HTML parsing? For example, I want to connect to www.yahoo.com
> and get all the tags and their values.
For this purpose, you may want to look at the APIs that the search
engines provide, rather than try to web-scrape the human-focused web
pages. Otherwise, your program will probably be fragile to changes in
the structure of the web site.
A search for search APIs comes up with hits like this:
https://developer.yahoo.com/boss/search/
https://developers.google.com/web-search/docs/#fonje_snippets
http://datamarket.azure.com/dataset/bing/search
https://pypi.python.org/pypi/duckduckgo2
If you can say more about what you're planning to do, perhaps someone
has already provided a programmatic interface to it.
More information about the Tutor
mailing list