Parsing an HTML a tag
George Sakkis
gsakkis at rutgers.edu
Sat Sep 24 18:29:34 EDT 2005
"Stephen Prinster" <prinster at mail.com> wrote:
> George wrote:
> > How can I parse an HTML file and collect only that the A tags. I have a
> > start for the code but an unable to figure out how to finish the code.
> > HTML_parse gets the data from the URL document. Thanks for the help
>
> Have you tried using Beautiful Soup?
>
> http://www.crummy.com/software/BeautifulSoup/
I agree; you can do what you want in two lines:
from BeautifulSoup import BeautifulSoup
hrefs = [link['href'] for link in BeautifulSoup(urllib.urlopen(url)).fetch('a')]
George
More information about the Python-list
mailing list