Problem when writing source tag in HTML5 video tag

Hi everyone!
I'm using lxml.html to read and write a HTML5 file. Now I noticed that if the HTML5 file contains a <video> element which contains a <source> element, when writing the file back a </source> is generated which shouldn't be there.
Minimal example:
---------8<---------8<---------8<---------8<---------8<---------
import lxml.html
data = """<!DOCTYPE html> <html> <head> <title>1</title> </head> <body> <video> <source src="1.ogv" type="video/ogg"> </video> </body> </html>"""
parser = lxml.html.HTMLParser() doc = lxml.html.document_fromstring(data, parser) data = b'<!DOCTYPE html>\n' + lxml.html.tostring(doc) result = data.decode('utf-8')
assert result == """<!DOCTYPE html> <html> <head> <title>1</title> </head> <body> <video> <source src="1.ogv" type="video/ogg"> </source></video> </body> </html>"""
---------8<---------8<---------8<---------8<---------8<---------
Am I doing something wrong? Or how can I get rid of </source>?
I also tried using lxml's html5parser, but lxml.html.tostring() also produced a </source> tag (and there was a "html:" namespace in every tag).
Thanks a lot and best regards, Felix

Hi,
I'm using lxml.html to read and write a HTML5 file. Now I noticed that if the HTML5 file contains a <video> element which contains a
<source> element, when writing the file back a </source> is generated which shouldn't be there.
[...]
Am I doing something wrong? Or how can I get rid of </source>?
I traced that back through the sources; the problem seems to be libxml2, whose HTML serializer only knows the HTML 4 tags.
Best, Felix
participants (1)
-
Felix Fontein