[New-bugs-announce] [issue10035] sgmllib fail to parse html containing <!- .... ->
report at bugs.python.org
Wed Oct 6 06:28:02 CEST 2010
New submission from halfjuice <halfjuice at gmail.com>:
When parsing html containing the following tag:
... <!- ie6 doesn't allow empty div. -> ...
SGMLParser will stop parse following content without any warning. When such tag is removed everything works fine.
When looking into sgmllib.py, statement below found:
if rawdata.startswith("<!", i):
# This is some sort of declaration; in "HTML as
# deployed," this should only be the document type
# declaration ("<!DOCTYPE html...>").
I think that's why something goes wrong here.
components: Library (Lib)
title: sgmllib fail to parse html containing <!- .... ->
versions: Python 2.6
Python tracker <report at bugs.python.org>
More information about the New-bugs-announce