BeautifulSoup

Brian J Mingus Brian.Mingus at Colorado.EDU
Fri Jan 15 15:59:09 EST 2010


On Wed, Jan 13, 2010 at 5:46 AM, yamamoto <blueskykind02 at gmail.com> wrote:
> Hi,
> I am new to Python. I'd like to extract "a" tag from a website by
> using "beautifulsoup" module.
> but it doesnt work!
>
> //sample.py
>
> from BeautifulSoup import BeautifulSoup as bs
> import urllib
> url="http://www.d-addicts.com/forum/torrents.php"
> doc=urllib.urlopen(url).read()
> soup=bs(doc)
> result=soup.findAll("a")
> for i in result:
>    print i
>
>
> Traceback (most recent call last):
>  File "C:\Users\falcon\workspace\p\pyqt\ex1.py", line 8, in <module>
>    soup=bs(doc)
>  File "C:\Python26\lib\site-packages\BeautifulSoup.py", line 1499, in
> __init__
>    BeautifulStoneSoup.__init__(self, *args, **kwargs)
>  File "C:\Python26\lib\site-packages\BeautifulSoup.py", line 1230, in
> __init__
>    self._feed(isHTML=isHTML)
>  File "C:\Python26\lib\site-packages\BeautifulSoup.py", line 1263, in
> _feed
>    self.builder.feed(markup)
>  File "C:\Python26\lib\HTMLParser.py", line 108, in feed
>    self.goahead(0)
>  File "C:\Python26\lib\HTMLParser.py", line 148, in goahead
>    k = self.parse_starttag(i)
>  File "C:\Python26\lib\HTMLParser.py", line 226, in parse_starttag
>    endpos = self.check_for_whole_start_tag(i)
>  File "C:\Python26\lib\HTMLParser.py", line 301, in
> check_for_whole_start_tag
>    self.error("malformed start tag")
>  File "C:\Python26\lib\HTMLParser.py", line 115, in error
>    raise HTMLParseError(message, self.getpos())
> HTMLParser.HTMLParseError: malformed start tag, at line 276, column 36
>
> any suggestion?
> thanks in advance
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>

BeautifulSoup is overkill for this anyways.

 *#!/bin/python*from urllib import urlopen
html = urlopen("http://www.d-addicts.com/forum/torrents.php").read()links
= set([link.split('"')[0] *for* link in html.split('href="')])
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20100115/ee542d9d/attachment.html>


More information about the Python-list mailing list