Looking for a specific html parser
dcengija_remove_ at inet.hr
Wed Mar 19 09:21:04 CET 2003
Grzegorz Adam Hankiewicz wrote:
> On Tue, Mar 18, 2003 at 09:07:47AM +0100, Davor Cengija wrote:
>> Basically, I need a DOM like parser for HTML, with xpath
>> capabilities. xml.dom might help me, but before that I obviously
>> need some kind of html-tidy.
> I required something similar for a small script and I found most
> useful to create first an HTMLParser which translated all code to
> xml and feed that into Python's minidom. It's quite easy to do if
> your input HTML is 'correct', otherwise the xml parsing will surely
> fail, unless you filter all through tidy, of course.
I doubt all of my input would be correct HTML, therefore I obviously need
tidy-like library. Unfortunatelly, I couldn't find native python tidy, only
the before-mentioned wrapper. However, I found java tidy implementation,
which could be helpful, together with jython.
Davor Cengija, dcengija_remove_ at inet.hr
More information about the Python-list