Problem when writing source tag in HTML5 video tag
data:image/s3,"s3://crabby-images/0d47e/0d47ea91611a1c1b692e9c96b81cedb1615350ba" alt=""
Hi everyone! I'm using lxml.html to read and write a HTML5 file. Now I noticed that if the HTML5 file contains a <video> element which contains a <source> element, when writing the file back a </source> is generated which shouldn't be there. Minimal example: ---------8<---------8<---------8<---------8<---------8<--------- import lxml.html data = """<!DOCTYPE html> <html> <head> <title>1</title> </head> <body> <video> <source src="1.ogv" type="video/ogg"> </video> </body> </html>""" parser = lxml.html.HTMLParser() doc = lxml.html.document_fromstring(data, parser) data = b'<!DOCTYPE html>\n' + lxml.html.tostring(doc) result = data.decode('utf-8') assert result == """<!DOCTYPE html> <html> <head> <title>1</title> </head> <body> <video> <source src="1.ogv" type="video/ogg"> </source></video> </body> </html>""" ---------8<---------8<---------8<---------8<---------8<--------- Am I doing something wrong? Or how can I get rid of </source>? I also tried using lxml's html5parser, but lxml.html.tostring() also produced a </source> tag (and there was a "html:" namespace in every tag). Thanks a lot and best regards, Felix -- Felix Fontein -- felix@fontein.de -- https://felix.fontein.de/
data:image/s3,"s3://crabby-images/0d47e/0d47ea91611a1c1b692e9c96b81cedb1615350ba" alt=""
Hi,
I traced that back through the sources; the problem seems to be libxml2, whose HTML serializer only knows the HTML 4 tags. Best, Felix -- Felix Fontein -- felix@fontein.de -- https://felix.fontein.de/
data:image/s3,"s3://crabby-images/0d47e/0d47ea91611a1c1b692e9c96b81cedb1615350ba" alt=""
Hi,
I traced that back through the sources; the problem seems to be libxml2, whose HTML serializer only knows the HTML 4 tags. Best, Felix -- Felix Fontein -- felix@fontein.de -- https://felix.fontein.de/
participants (1)
-
Felix Fontein