How to parse XHTML with xml.parsers.xmlproc?
aleax at aleax.it
Mon Sep 17 10:21:49 CEST 2001
"Paavo Hartikainen" <pahartik at sci.fi> wrote in message
news:873d5mo5cz.fsf at zazu.vip.fi...
> I managed to understand enough of the xml.parsers.xmlproc to feed data
> to it. File I am trying to parse is valid XHTML but I have not
> managed to make my xmlproc.XMLProcessor to parse according to correct
> DTD in file "DTD/xhtml1-strict.dtd".
> These are the errors that parsers gives:
> fatal: End tag for 'head' seen, but 'meta' expected
> fatal: End tag for 'html' seen, but 'meta' expected
> fatal: Premature document end, element 'meta' not closed
> I am sure it is just because it does not know XHTML DTD. What does it
> try to parse against anyway if no DTD is defined? Is there some kind
> of default XML DTD?
It doesn't matter. You have a <meta> tag that is not closed.
NO DTD will ever make that document valid XML, period.
Add a </meta> right after the <meta>, or change the closing >
of the meta itself into />.
> Maybe this whole XML thing is just too complicated for me and I should
> find something else to play with...
This part is not too hard: *well-formed* comes before *valid*.
Each tag that is opened needs to be closed, with proper nesting
(and case-sensitivity) -- THIS part is very simple:-). And it's
also key difference number one between XHTML (which must be
well-formed XML: all opened tags need to be closed &c) and HTML
(which is traditionally much laxer in checking -- many tags
are traditionally opened but not closed -- etc, etc).
More information about the Python-list