sgmllib.py not good at handling <br/>
Chris Withers
chrisw at nipltd.com
Mon May 14 08:22:13 EDT 2001
Hi,
I posted this to the bug Tracker:
http://sourceforge.net/tracker/?func=detail&aid=423779&group_id=5470&atid=105470
...but it's holding me up badly so I thought I'd ask here too in the hope that
one of you kind souls can help out :-)
When parsing the following HTML:
'Roses <b>are</B> red,<br/>violets <i>are</i> blue'
...with the following class:
class HTML2SafeHTML(sgmllib.SGMLParser):
def handle_data(self, data):
print "***data***"
print data
def unknown_starttag(self, tag, attrs):
print "***start**"
print tag
print (attrs)
def unknown_endtag(self, tag):
print "***end**"
print tag
I get the following output, which isn't right :-S
***data***
Roses
***start**
b
[]
***data***
are
***end**
b
***data***
red,
***start**
br
[]
***data***
>violets <i>are<
***end**
br
***data***
i> blue
Any idea what's broken, where and how to fix it? I get the same with the
htmllib.py in both python 1.5.2, 2.0 and the latest from CVS.
cheers,
Chris
More information about the Python-list
mailing list