Is there a way to stop the HTMLParser rewriting invalid HTML tags?

Nov. 25, 2015
9:40 a.m.
Hello, I have a question regarding the HTMLParser in lxml. Is there a way to make it less 'strict'? For example, as I have it currently configured, it will re-write <h1><p>Hello</p></h1> to <h1></h1><p>Hello</p> The HTML spec does not allow <p> tags to be contained within <h1> tags. However, for my specific use case, I would like to try and leave the html as close to the original html as possible even if it is invalid. The code I am using presently is as follows:
The version of lxml I am using is 3.5. So I was wondering if there was a way to disable this re-ordering of the tags so that the <p> would remain as a child of the <h1>? Thanks!
3378
Age (days ago)
3378
Last active (days ago)
0 comments
1 participants
participants (1)
-
Austin Platt