BeautifulSoup

Peter Otten __peter__ at web.de
Wed Sep 2 04:39:34 EDT 2009


elsa wrote:

> if I have some HTML that looks like this:
> 
> <area coords="427,724,432,732" href="http://BioCyc.org/ECOLI/NEW-IMAGE?
> type=GENE-IN-CHROM-BROWSER&object=EG12309" onmouseover="return
> overlib('<b>Gene:</b> yjtD<BR><b>Product:</
> b> predicted rRNA methyltransferase, subunit of predicted rRNA
> methyltransferase<BR><b>Intergenic distances (bp):</
> b> yjjY< +400 yjtD +214 >thrL');"><b>Gene:</b> yjtD<br /
>><b>Product:</b> predicted rRNA methyltransferase, subunit of
> predicted rRNA methyltransferase<br /><b>Intergenic distances (bp):</
> b> yjjY< +400 yjtD +214 >thrL');" onmouseout="return nd();">
> </area>
> 
> is there an easy way to use BeautifulSoup to extract just the value of
> the href attribute?

>>> from BeautifulSoup import BeautifulSoup as BS
>>> html = "<area ..."
>>> BS(html).find("area")["href"]
u'http://BioCyc.org/ECOLI/NEW-IMAGE?\ntype=GENE-IN-CHROM-
BROWSER&object=EG12309'





More information about the Python-list mailing list