[lxml-dev] A bit of oddness in the HTML parser

27 Mar
2007
27 Mar
'07
1:06 p.m.
I'm seeing that the HTML parser is doing something undesirable.
If I have (note, the script tag is not closed):
s = '<div>d_txt<script src="blah">s_tail</div>'
and I use the HTML parser on that string, I get the ending div html-escaped in the script's text.
r = HTML(s) tostring(r)
'<html><body><div>d_txt<script src="blah">s_tail</div></script></div></body></html>'
I'm guessing this is upstream behavior? I was hoping to get
'<html><body><div>d_txt<script src="blah">s_tail</script></div></body></html>'
I think I can live with this behavior if nobody else thinks this is a bug. Yes, I realize that tag-soup parsers are hard to do. :)
-Jim Washington
5850
Age (days ago)
5850
Last active (days ago)
0 comments
1 participants
participants (1)
-
Jim Washington