HTML DOM parser?

Peter Hansen peter at engcorp.com
Thu Jul 18 18:13:16 EDT 2002


Paul Rubin wrote:
> 
> Anyone know of a Python-callable HTML DOM parser?  I mean a serious
> one that tries to understand the crappy malformed out there in the
> real-world Web, the way a browser does.  If it can interpret
> Javascript that's even better.  This is for a consulting client, so a
> commercial library would be acceptable (though not preferred).

How about automating IE using Python?

from win32com.client import DispatchEx

ie = DispatchEx('internetexplorer.application')
ie.visible = 1
ie.navigate('http://www.nightsong.com')
dom = ie.document

etc...

Access to the DOM tree of the document might be too slow for your
needs, but if it's not, you definitely get a lot of bang for the buck...

-Peter



More information about the Python-list mailing list