HTML Parsing
Ayaz Ahmed Khan
ayaz at dev.slash.null
Sun Feb 11 02:05:45 EST 2007
"mtuller" typed:
> I have also tried Beautiful Soup, but had trouble understanding the
> documentation
As Gabriel has suggested, spend a little more time going through the
documentation of BeautifulSoup. It is pretty easy to grasp.
I'll give you an example: I want to extract the text between the
following span tags in a large HTML source file.
<span class="title">Linux Kernel Bluetooth CAPI Packet Remote Buffer Overflow Vulnerability</span>
>>> import re
>>> from BeautifulSoup import BeautifulSoup
>>> from urllib2 import urlopen
>>> soup = BeautifulSoup(urlopen('http://www.someurl.tld/'))
>>> title = soup.find(name='span', attrs={'class':'title'}, text=re.compile(r'^Linux \w+'))
>>> title
u'Linux Kernel Bluetooth CAPI Packet Remote Buffer Overflow Vulnerability'
--
Ayaz Ahmed Khan
A witty saying proves nothing, but saying something pointless gets
people's attention.
More information about the Python-list
mailing list