How use XML parsing tools on this one specific URL?

Stefan Behnel stefan.behnel-n05pAM at web.de
Mon Mar 5 12:34:49 EST 2007


seberino at spawar.navy.mil schrieb:
> I understand that the web is full of ill-formed XHTML web pages but
> this is Microsoft:
> 
> http://moneycentral.msn.com/companyreport?Symbol=BBBY
> 
> I can't validate it and xml.minidom.dom.parseString won't work on it.

Interestingly, no-one mentioned lxml so far:

http://codespeak.net/lxml
http://codespeak.net/lxml/dev/parsing.html#parsers

Parse it as HTML and then use anything from XPath to XSLT to treat it.

Have fun,
Stefan



More information about the Python-list mailing list