Simple HTML to XML parser?
Pisel
pisel at opencan.cc
Tue Nov 14 11:35:32 EST 2000
jsantaniello at my-deja.com wrote:
> Hi Everyone,
>
> Does anyone have or know of a simple HTML to XML parser? The sax package
> is too much for me to handle. What I'm looking for is the ability to
> grab some html with urllib for example and then access an object like:
>
> page = urlopen(url)
> the_value = page.body.form[0].hidden_element_name.value
>
> Or something similar. What I'm doing now is just grabbing the page as a
> string and searching for tokens and then doing some slicing. But this is
> all so hard coded, and subject to the vagaries of web-designers that I
> don't trust it.
>
> Anyone have any suggestions?
If you simply want to transform html to xml you could call tidy
(http://www.w3.org/People/Raggett/tidy/) with its -asxml option. It has the
advantage to repair broken html.
Pisel
More information about the Python-list
mailing list