Get html DOM tree by only basic builtin moudles
Wesley
nispray at gmail.com
Thu Jun 4 07:58:23 EDT 2015
Hi guys,
I know there are many modules(builtin or not, e.g. beautifulsoup,xml,lxml,htmlparser .etc) to parse html files and output the DOM tree. However, if there is any better way to get the DOM tree without using those html/xml related modules? I mean, just by some general standard modules, e.g. file operations, re module .etc
Input file is something like this:
<html>
<head>
<title>DOM Tree test</title>
</head>
<body>
<h1>Header 1</h1>
<p>Hello world!</p>
</body>
</html>
Need the dom tree or just something like:
html -- head -- title(DOM Tree test)
html -- body -- h1(Header 1)
html -- body -- p(Hello world!)
Thanks.
Wesley
More information about the Python-list
mailing list