"<!" in SGMLParser - an error ?

Amit Weisman weismann at netvision.net.il
Sat Nov 10 21:55:05 CET 2001


I'm trying . :)
I have a parser for HTML pages . My program scans them  automaticaly .
After 500 pages that the parser worked OK (no errors) it failed with the
error message :

Traceback (most recent call last):
  File "<pyshell#103>", line 1, in ?
    a.feed(b)
  File "C:\Python21\lib\sgmllib.py", line 91, in feed
    self.goahead(0)
  File "C:\Python21\lib\sgmllib.py", line 158, in goahead
    k = self.parse_declaration(i)
  File "C:\Python21\lib\sgmllib.py", line 238, in parse_declaration
    raise SGMLParseError(
SGMLParseError: unexpected char in declaration: '<'


I attached to this message a sample of a parser that should print the
comments to the screen -
when "feeding" this parser with "correct" string as
>>> b = """<html>
<!--  Regular comment  -->

<head>
  <title></title>
</head>
<body>
bla bla bla
</body>
</html>

"""
>>> a = sample()
>>> a.feed(b)
  Regular comment


It's OK

But with a string that contains "<   bla bla bla>" or "<! bla bla> " I get
an error message ->

>>> b = """<html>
< eee >

<head>
  <title></title>
</head>
<body>
<! ddd >
bla bla bla
</body>
</html>

"""
>>> a = sample()
>>> a.feed(b)
Traceback (most recent call last):
  File "<pyshell#118>", line 1, in ?
    a.feed(b)
  File "C:\Python21\lib\sgmllib.py", line 91, in feed
    self.goahead(0)
  File "C:\Python21\lib\sgmllib.py", line 158, in goahead
    k = self.parse_declaration(i)
  File "C:\Python21\lib\sgmllib.py", line 238, in parse_declaration
    raise SGMLParseError(
SGMLParseError: unexpected char in declaration: '<'

I hope I was more understandable this time .
I'm new to Python and this news group . Sorry .

Thanks
Amit


----- Original Message -----
From: "Hernan M. Foffani" <hfoffani at yahoo.com>
Newsgroups: comp.lang.python
To: <python-list at python.org>
Sent: Saturday, November 10, 2001 8:00 PM
Subject: Re: "<!" in SGMLParser - an error ?


> "Amit Weisman" <weismann at netvision.net.il> escribiף en el mensaje
> news:mailman.1005403215.5502.python-list at python.org...
> > I don't know what comes after the "<" . It's not "<!" - Those are
> processed
> > by handke_comment .
> > How can I parse (or even better - ignore) the "<" ??
>
> Amit,
> I think it would be best if you tell us exactly what do you want to do
> giving us a sample input too.
> I'll promise you'll have a better answer from us.
>
> Regards,
> -Hernan
>
>
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: sample.py
URL: <http://mail.python.org/pipermail/python-list/attachments/20011110/396b9ecc/attachment.ksh>


More information about the Python-list mailing list