Re: [lxml-dev] A bit of oddness in the HTML parser

27 Mar
2007
27 Mar
'07
7:42 a.m.
Jim Washington wrote:
s = '<div>d_txt<script src="blah">s_tail</div>'
and I use the HTML parser on that string, I get the ending div html-escaped in the script's text.
r = HTML(s) tostring(r)
'<html><body><div>d_txt<script src="blah">s_tail</div></script></div></body></html>'
I'm guessing this is upstream behavior?
Definitely. There is always an end to what a machine can fix in tag soup. Feel free to file a request for enhancement on libxml2's HTML parser.
Stefan
5838
Age (days ago)
5838
Last active (days ago)
0 comments
1 participants
participants (1)
-
Stefan Behnel