[lxml-dev] html5lib tree builder in lxml 2.2
Hi, Getting around to actually looking at lxml 2.2's html5lib support, I note that it has its own treebuilder: I presume there was some reason (bugs?) that html5lib's own wasn't used. Would it be possible to get a patch for html5lib that would fix these issues (this'll need to be under the MIT license)? -- Geoffrey Sneddon <http://gsnedders.com/>
Geoffrey Sneddon wrote:
Getting around to actually looking at lxml 2.2's html5lib support, I note that it has its own treebuilder: I presume there was some reason (bugs?) that html5lib's own wasn't used.
Armin Ronacher wrote that part, so he should know: http://comments.gmane.org/gmane.comp.python.lxml.devel/3848?set_lines=100000 It uses a subclass of html5lib's TreeBuilder, so it's not a rewrite or something in that order.
Would it be possible to get a patch for html5lib that would fix these issues (this'll need to be under the MIT license)?
It's mainly about stuff that ET doesn't support, such as the DOCTYPE, or top-level comments. I don't know if the html5lib project is interested in that, but it shouldn't be too hard to add some conditional lxml specifics to their code. Stefan
On 25 Mar 2009, at 21:44, Stefan Behnel wrote:
Would it be possible to get a patch for html5lib that would fix these issues (this'll need to be under the MIT license)?
It's mainly about stuff that ET doesn't support, such as the DOCTYPE, or top-level comments. I don't know if the html5lib project is interested in that, but it shouldn't be too hard to add some conditional lxml specifics to their code.
There is already a whole separate lxml treebuilder in html5lib. I'm in part wondering why that wasn't used verbatim, and if there are any issues with it fixed in lxml 2.2's treebuilder that a patch be made available under licensing terms acceptable to html5lib (I'd probably look more closely to see quite what was changed if I could actually copy changes safely with the licensing being such). -- Geoffrey Sneddon <http://gsnedders.com/>
Geoffrey Sneddon wrote:
On 25 Mar 2009, at 21:44, Stefan Behnel wrote:
Would it be possible to get a patch for html5lib that would fix these issues (this'll need to be under the MIT license)?
It's mainly about stuff that ET doesn't support, such as the DOCTYPE, or top-level comments. I don't know if the html5lib project is interested in that, but it shouldn't be too hard to add some conditional lxml specifics to their code.
There is already a whole separate lxml treebuilder in html5lib.
Ah, interesting. I assume it just wasn't there at the time.
I'm in part wondering why that wasn't used verbatim, and if there are any issues with it fixed in lxml 2.2's treebuilder that a patch be made available under licensing terms acceptable to html5lib (I'd probably look more closely to see quite what was changed if I could actually copy changes safely with the licensing being such).
From a quick look, it actually seems like the "etree_lxml" tree builder in
Come on, lxml is BSD licensed. If html5lib is MIT licensed, I doubt that anyone would be mad enough to put hope into suing you if you edit a file in one while taking a glimpse at the other. html5lib has learned from the one in lxml.html already. So, please give it a review if you can. I wouldn't mind simply importing the html5lib one in lxml.html.html5parser if it's available. Stefan
participants (2)
-
Geoffrey Sneddon
-
Stefan Behnel