HTML parsing bug?
R.Brodie at rl.ac.uk
Mon Jan 30 15:59:00 CET 2006
<g_no_mail_please at yahoo.com> wrote in message
news:1138632328.306349.241430 at g44g2000cwa.googlegroups.com...
> Python 2.3.5 seems to choke when trying to parse html files, because it
> doesn't realize that what's inside <!-- --> is a comment in HTML,
> even if this comment is inside <script> </script>, especially if it's a
> comment inside that script code too.
Actually, you are technically incorrect; try validating the code you posted.
Google found this explanation: http://lachy.id.au/log/2005/05/script-comments
Feeding even slightly invalid HTML to the standard library parser will often
choke it. If you can't guarantee clean sources, best use Tidy first or another
More information about the Python-list