Simple HTML to XML parser?

Pisel pisel at opencan.cc
Tue Nov 14 11:35:32 EST 2000


jsantaniello at my-deja.com wrote:

> Hi Everyone,
> 
> Does anyone have or know of a simple HTML to XML parser? The sax package
> is too much for me to handle. What I'm looking for is the ability to
> grab some html with urllib for example and then access an object like:
> 
> page = urlopen(url)
> the_value = page.body.form[0].hidden_element_name.value
> 
> Or something similar. What I'm doing now is just grabbing the page as a
> string and searching for tokens and then doing some slicing. But this is
> all so hard coded, and subject to the vagaries of web-designers that I
> don't trust it.
> 
> Anyone have any suggestions?

If you simply want to transform html to xml you could call tidy 
(http://www.w3.org/People/Raggett/tidy/) with its -asxml option. It has the 
advantage to repair broken html.

Pisel



More information about the Python-list mailing list