Working with HTML5 documents

Denis McMahon denismfmcmahon at gmail.com
Thu Nov 20 18:03:42 CET 2014


On Wed, 19 Nov 2014 13:43:17 -0800, Novocastrian_Nomad wrote:

> On Wednesday, November 19, 2014 2:08:27 PM UTC-7, Denis McMahon wrote:
>> So what I'm looking for is a method to create an html5 document using
>> "dom manipulation", ie:
>> 
>> doc = new htmldocument(doctype="HTML")
>> html = new html5element("html")
>> doc.appendChild(html)
>> head = new html5element("body")
>> html.appendChild(head)
>> body = new html5element("body")
>> html.appendChild(body)
>> title = new html5element("title")
>> txt = new textnode("This Is The Title")
>> title.appendChild(txt)
>> head.appendChild(title)
>> para = new html5element("p")
>> txt = new textnode("This is some text.")
>> para.appendChild(txt)
>> body.appendChild(para)
>> 
>> print(doc.serialise())
>> 
>> generates:
>> 
>> <!doctype HTML><html><head><title>This Is The Title</title></
>> head><body><p>This is some text.</p></body></html>
>> 
>> I'm finding various mechanisms to generate the structure from an
>> existing piece of html (eg html5lib, beautifulsoup etc) but I can't
>> seem to find any mechanism to generate, manipulate and produce html5
>> documents using this dom manipulation approach. Where should I be
>> looking?

> Use a search engine (Google, DuckDuckGo etc) and search for 'python
> write html'

Surprise surprise, already tried that, can't find anything that holds the 
document in the sort of tree structure that I want to manipulate it in.

Everything there seems to assume I'll be creating a document serially, eg 
that I won't get to some point in the document and decide that I want to 
add an element earlier.

bs4 and html5lib will parse a document into a tree structure, but they're 
not so hot on manipulating the tree structure, eg adding and moving nodes.

Actually it looks like bs4 is going to be my best bet, although limited 
it does have most of what I'm looking for. I just need to start by giving 
it "<html></html>" to parse.

-- 
Denis McMahon, denismfmcmahon at gmail.com



More information about the Python-list mailing list