How to parse XHTML with xml.parsers.xmlproc?

Alex Martelli aleax at aleax.it
Mon Sep 17 04:21:49 EDT 2001


"Paavo Hartikainen" <pahartik at sci.fi> wrote in message
news:873d5mo5cz.fsf at zazu.vip.fi...
> I managed to understand enough of the xml.parsers.xmlproc to feed data
> to it.  File I am trying to parse is valid XHTML but I have not
> managed to make my xmlproc.XMLProcessor to parse according to correct
> DTD in file "DTD/xhtml1-strict.dtd".
>
> These are the errors that parsers gives:
>
>   fatal: End tag for 'head' seen, but 'meta' expected
>   fatal: End tag for 'html' seen, but 'meta' expected
>   fatal: Premature document end, element 'meta' not closed
>
> I am sure it is just because it does not know XHTML DTD.  What does it
> try to parse against anyway if no DTD is defined?  Is there some kind
> of default XML DTD?

It doesn't matter.  You have a <meta> tag that is not closed.
NO DTD will ever make that document valid XML, period.

Add a </meta> right after the <meta>, or change the closing >
of the meta itself into />.


> Maybe this whole XML thing is just too complicated for me and I should
> find something else to play with...

This part is not too hard: *well-formed* comes before *valid*.
Each tag that is opened needs to be closed, with proper nesting
(and case-sensitivity) -- THIS part is very simple:-).  And it's
also key difference number one between XHTML (which must be
well-formed XML: all opened tags need to be closed &c) and HTML
(which is traditionally much laxer in checking -- many tags
are traditionally opened but not closed -- etc, etc).


Alex






More information about the Python-list mailing list