How use XML parsing tools on this one specific URL?
fredrik at pythonware.com
Mon Mar 5 09:55:01 CET 2007
skip at pobox.com wrote:
> Chris> http://moneycentral.msn.com/companyreport?Symbol=BBBY
> Chris> I can't validate it and xml.minidom.dom.parseString won't work on
> Chris> it.
> Chris> If this was just some teenager's web site I'd move on. Is there
> Chris> any hope avoiding regular expression hacks to extract the data
> Chris> from this page?
> Tidy it perhaps or use BeautifulSoup? ElementTree can use tidy if it's
ElementTree can also use BeautifulSoup:
as noted on that page, tidy is a bit too picky for this kind of use; it's better suited
for "normalizing" HTML that you're producing yourself than for parsing arbitrary
More information about the Python-list