[Chicago] sgmlparser problem

Tue Dec 12 01:34:20 CET 2006

Yea that is one solution. It does work, but instead of skipping bad html 
i am fixing it and then trashing it. It seems kind of odd.

Thank,
Lucas

David Terrell wrote:
> Bad HTML markup should probably go through BeautifulSoup, which
> tries to deal with this kind of awfulness.
>
> http://www.crummy.com/software/BeautifulSoup/
>
>
> On Sun, Dec 10, 2006 at 05:35:12PM -0600, Lukasz Szybalski wrote:
>   
>> Hello,
>> Would you guys know how to bypass this error i'm getting from sgml parser.
>>
>> expected name token at '<! -- NEW PAGE -'
>>
>> Obviously the <! -- should be <!-- 
>>
>> How can i tell sgmlparser to move on and/or bypass not valid html.
>>
>>
>> File "/usr/lib/python2.4/sgmllib.py", line 95, in feed
>>     self.goahead(0)
>>   File "/usr/lib/python2.4/sgmllib.py", line 165, in goahead
>>     k = self.parse_declaration(i)
>>   File "/usr/lib/python2.4/markupbase.py", line 95, in parse_declaration
>>     decltype, j = self._scan_name(j, i)
>>   File "/usr/lib/python2.4/markupbase.py", line 384, in _scan_name
>>     self.error("expected name token at %r"
>>   File "/usr/lib/python2.4/sgmllib.py", line 102, in error
>>     raise SGMLParseError(message)
>> sgmllib.SGMLParseError: expected name token at '<! -- NEW PAGE -
>>
>> thanks
>> Lucas
>> _______________________________________________
>> Chicago mailing list
>> Chicago at python.org
>> http://mail.python.org/mailman/listinfo/chicago
>>
>>     
>
>