Simple HTML to XML parser?

Tom Smith tom.smith at i.am
Tue Nov 14 04:03:07 EST 2000


In article <8uq132$rek$1 at nnrp1.deja.com>,
  jsantaniello at my-deja.com wrote:
> Hi Everyone,
>
> Does anyone have or know of a simple HTML to XML parser?
The sax package
> is too much for me to handle. What I'm looking for is the ability to
> grab some html with urllib for example and then access an
object like:
>
> page = urlopen(url)
> the_value = page.body.form[0].hidden_element_name.value
>
> Or something similar. What I'm doing now is just grabbing the
page as a
> string and searching for tokens and then doing some slicing. But
this is
> all so hard coded, and subject to the vagaries of web-designers
that I
> don't trust it.

I'm not completely sure about this, but I think you need a defferent
type of parser. The sax stuff eats one character at a time and fires
off events when tags are started and ended.

What I think you want is something that reads the HTML into a
tree-like structure that you can navigate. If you're using python 1.5
Pyxie does this...wanna buy a book :-)  ? ....with 2.0 I've a modicom
of success with minidom.py in the /lib/xml/ folder.

It read the whole thing into a tree that you can then .getChildren()
and getElementsNamed() (sorta)

hope this helps

tom


Sent via Deja.com http://www.deja.com/
Before you buy.



More information about the Python-list mailing list