HTMLParser.HTMLParseError: EOF in middle of construct
nagle at animats.com
Wed Jun 20 06:19:09 CEST 2007
> Gabriel Genellina wrote:
>> En Mon, 18 Jun 2007 16:38:18 -0300, Sergio Monteiro Basto
>> <sergio at sergiomb.no-ip.org> escribió:
>>> Can someone explain me, what is wrong with this site ?
>>> python linkExtractor3.py http://www.noticiasdeaveiro.pt > test
> ok but my problem is not understand what is the specific problem at line
>> HTMLParser expects valid HTML - try a different tool, like
>> BeautifulSoup, which is specially designed to handle malformed pages.
>> --Gabriel Genellina
Yes, you almost have to use BeautfulSoup on real-world web pages.
Even that may not be enough; I have my own even more robust version of
BeautifulSoup. (I've sent the fixes, which are small, to the author.)
The usual BeautifulSoup killer is improperly terminated HTML comments. The
default action is to suck up the rest of the entire document into
the comment, which is usually not what you want. I have a fix for that
More information about the Python-list