How use XML parsing tools on this one specific URL?
Stefan Behnel
stefan.behnel-n05pAM at web.de
Mon Mar 5 12:34:49 EST 2007
seberino at spawar.navy.mil schrieb:
> I understand that the web is full of ill-formed XHTML web pages but
> this is Microsoft:
>
> http://moneycentral.msn.com/companyreport?Symbol=BBBY
>
> I can't validate it and xml.minidom.dom.parseString won't work on it.
Interestingly, no-one mentioned lxml so far:
http://codespeak.net/lxml
http://codespeak.net/lxml/dev/parsing.html#parsers
Parse it as HTML and then use anything from XPath to XSLT to treat it.
Have fun,
Stefan
More information about the Python-list
mailing list