Problem with xml.dom parser and xmlns attribute
R.Brodie at rl.ac.uk
Fri Apr 23 11:03:42 CEST 2004
"Peter Maas" <peter.maas at mplusr.de> wrote in message news:c68jai$g85$1 at swifty.westend.com...
> Thanks, Richard. But in the Internet most of the time I don't know
> what kind of document I'm dealing with when I start parsing. I guess
> I should use HTMLParser (?).
If you're dealing with a wide range of web pages, chances are they
will have all manner of rubbish in them. I would probably feed the
stuff through Tidy (or uTidyLib) first, to convert to cleanish XHTML,
then use an XML parser.
More information about the Python-list